Somehow I managed to pause Ted Lasso long enough to add the finishing touches on this week’s newsletter. No tricks, but plenty of treats (and an XL-sized bag of KubeCon videos) for your enjoyment – stay safe and enjoy the stories! 🎃👻🦇
This issue is sponsored by:
Start incident response with context to all your alerts in one view
Moogsoft speeds up incident response with dynamic anomaly detection, suppressed alert noise, and correlated insights across all your telemetry data. Go from debugging across multiple tools, screens, and dashboards into a single incident view so you and your teams can take a more proactive approach to reduce MTTR. Sign up for the Moogsoft Free community plan today!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
KubeCon 2021 was a massive online event, with over 200 (!!) recorded talks. I’ve combed through all of them to create a playlist of the 19 videos specifically about monitoring and observability.
Yes, yes, yes, and yes. An important article that we should all read and take to heart. I think it’s vital that we provide space for our teams to build, iterate, and most importantly, fail. I know it’s cliche, but failure provides the best learning opportunities. Please share this one.
I love reading about how companies think about monitoring, actionable (or not) alerts, and empowering teams with the data needed to increase reliability and to surface problems before they become customer-impacting events. Props to Azimo’s engineering leadership for sharing their experiences.
Personally, I’d rather just throw Thanos in front of my Prometheus clusters and not have to think about manual federation. But to be fair, my last big Thanos deployment far exceeded the limitations of native Prometheus federation, and the author of this article may not have the freedom to deploy yet another collection of services. If this sounds like you, I would definitely check this out.
Honestly, this would have saved my bacon at a previous gig where we used TLS certificates for everything.
A fun look at how one company’s growth and evolution might influence the observability practices and tooling they adopt. This is a great article to share with friends who may not be as experienced in these areas.
I’m genuinely surprised to hear of another alternative Carbon project, and even more that it’s coming out of Salesforce. Still, it offers a compelling alternative to the traditional Python services, the Go-Graphite stack, and possibly even newcomers like Clickhouse.
It doesn’t surprise me to read that folks are using observability data for use cases outside of traditional DevOps and Engineering applications. I’ve seen this in action myself, where business and marketing teams would leverage our systems rather than trying to build up more complex analytical queries elsewhere.
Customers named LogicMonitor #1 in satisfaction in the Fall 2021 Network Monitoring grid from G2. Download the full report to see real user reviews and rankings across top network monitoring vendors. (SPONSORED)
It can take years to build up trust in our systems (and among teams), so I can empathize with the situations presented here. Precision, transparency, and communications are key.
Timescale has released a beta version of Promscale with support for downsampled Prometheus metrics. There are some benefits to their “continuous aggregates” feature versus Prometheus recording rules, but it also means introducing a new system just for maintaining your aggregate data. Still, Timescale does offer some unique advantages over traditional PromQL queries.
InfluxDB (and its related projects) is one of those systems that’s evolved a lot since their early days. If you’ve ever been curious about what it might look like to deploy an “Enterprise-ready” Influx stack, this article is a good place to get started. (Note: the code formatting seems to be broken in this article, but there’s still a good bit of useful info before that)
“CarbonJ is a drop-in replacement for carbon-cache and carbon-relay. It was designed with high performance read and write throughput in mind and supports writing millions of metric data points and serve millions of metrics datapoints per minute with low query latency.”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor