This issue is SLO heavy with a fair bit of performance engineering and instrumentation topics. In short, everything I love to read about and nothing I’m particularly good at.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Any effective observability platform should exist for the benefit of its customers. A few high-level considerations to keep in mind when beginning your observability journey.
Frankly, I never get tired of seeing companies switch between build versus buy (and back again). No matter which “team” you’re on, it’s always educational to hear that it’s possible (and cost-effective) to make that pivot. Another win for Thanos.
Sloth generates SLOs easily for Prometheus based on a spec/manifest that scales. Is easy to understand and maintain.
For as much as folks talk about SLOs, I haven’t seen a lot of standardization in how we document them, communicate them, etc. I’m very excited to see a project like Sloth surface, and I hope it continues to mature. Interestingly, this is how I first heard of the OpenSLO specification.
If you’re already using serverless (or considering it), this is a great primer on how to use distributed logging effectively for your services.
This might be the best article I’ve read about SLOs all year. I love hearing when teams communicate and listen to one another while defining their SLOs and learning from their mistakes.
A fun look at tracking a robotic lawn mower in Grafana using a DIY python exporter.
This article summarizes some of the more interesting takeaways around performance, debugging, and optimizations from an interview with one of Facebook’s performance engineers. If you have time, I strongly encourage you to check out the full podcast.
A follow-up to the recent release of pgSCV, the developer has released some Grafana dashboards to support many of the metrics collected by the exporter.
This post dovetails nicely with the other SLO articles this week. Beyond the golden signals, what else should you be monitoring? Quite a bit, as it turns out.
This is a “chonky boi” of a technical article, but there’s so much good stuff in here I simply had to include it. Talk about squeezing every last drop of performance out of a system. And I loooove the inclusion of flame graphs.
I don’t know that I agree with the author’s assertion of “automatic” here, but this looks like a useful instrumentation pattern to share with your fellow Gophers.
A solid introduction to monitoring your Jenkins agents with Prometheus, including some metrics that you’re unable to get with a traditional node_exporter setup.
OpenSLO is a service level objective (SLO) language that declaratively defines reliability and performance targets using a simple YAML specification.
Ok, this isn’t technically a tool, but I love that some folks have finally gotten together to formalize a specification for SLOs. Even the Sloth author is aiming to comply with the OpenSLO specification. I’m definitely keeping an eye on this initiative.
One of the first technical conferences to resume in-person events, Monitorama is returning to Portland, OR this fall. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor