This week’s recurring theme was on Incident Management and how to do it sustainably, along with a number of articles on Prometheus and open source tooling. Now if you’ll excuse me, I’m going to try that OpenTelemetry demo and see about finally instrumenting my apps for tracing. ⏰📈🍿
This issue is sponsored by:
The Plug-and-Debug Serverless Observability Platform
Trouble locating bugs in your serverless environment? Quit wasting precious development time and get an end-to-end map of your services in just four minutes with 1-click distributed tracing. Navigate your serverless chaos seamlessly—with Lumigo.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
A refreshing look at monitoring tooling from the perspective of an application engineer.
If you’re considering Google’s managed Prometheus service, you owe it to yourself to check out this very thorough tutorial.
There are a lot of companies out there who don’t have a comprehensive policy for incident response. I’ve paged (pun intended) through the first few chapters of this guide and came away really impressed. Props to the author(s) for keeping the vendor pitches to a bare minimum.
Remember when we used to manage and name (gasp) our servers? This post feels like a bit of a throwback to days of yore, walking the user through some basic Linux administration and troubleshooting tasks before a very detailed look at setting up Prometheus and the various exporters you’ll need for any modern Linux system.
This might be the most approachable way of learning OpenTelemetry that I’ve seen to date. I like that they use the OpenTelemetry Community Demo Application as the demo service for this example.
Leon might be the only person looking forward to Monitorama PDX 2022 more than me (although I’d wager I’m still more anxious). Still, I appreciate that so many folks are looking forward to it and that he thought to write this sneak peek at what we can expect from the event.
Now more than ever, we need monitoring and observability built for the cloud native world. The new O’Reilly Media report addresses practical challenges and solutions for modern architecture, highlighting the roles of observability and metrics, how to harness growing metric data, and the nuts and bolts of great metrics functions. Download your copy today! (SPONSORED)
Last week was a big week for Grafana Labs, announcing their latest Grafana 9.0 release at GrafanaCONline (that’s a lot of Grafana). I’m cautiously optimistic in what they’re trying to do with the new visual query builder for Prometheus (having a bit of experience working with time-series UIs myself). Still, I’m anxious to see how they evolve this functionality going forward.
A look at how Honeycomb tracks their own internal on-call operational health, and the steps they’re taking to improve it.
Another look at the classic build-versus-buy decision for observability and monitoring resources.
Setup Prometheus, Kube State metrics and Integrate Grafana with Kubernetes
A detailed two-part series on using Prometheus with kube-state-metrics (KSM) to monitor your Kubernetes cluster.
Another announcement from Grafana last week, they’ve released an open source, self-hosted version of their on-call project originally released to Grafana Cloud earlier this year.
“Developer-friendly incident response with Slack integration.”
“kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.”
I probably don’t need to remind you, but Monitorama is coming up in just over a week. I’m going to be there with the entire family and I can’t wait to see a bunch of friendly (masked) faces. We have a fantastic lineup of speakers and some fun activities planned. There are still a few dozen tickets remaining if you’re in the area and would like to join us.
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor