It’s time for another “best of” issue! We have some fantastic articles here covering the most popular topics and themes from the past few months. Enjoy!
This issue is sponsored by:
You might have heard discussions about the “three phases of observability.” But what do they really mean? Chronosphere is a SaaS cloud monitoring tool that helps teams rapidly navigate the three phases of observability. Learn more about Chronosphere and the three phases of observability here.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
A look at how HelloFresh implemented a Dead Man’s Switch on top of their Prometheus and Thanos stack.
Despite the title, this is a fairly deep-dive into eBPF internals, writing your own eBPF programs, its potential for observability and much, much more.
Most teams I’ve worked with will slap a bunch of metrics and graphs together without really understanding how to use the data effectively. This is a thoughtful look at how to design a dashboard with your users in mind.
We see a lot of articles about OpenTelemetry, but this might be the most concise and helpful one I’ve read yet. Bookmark this one and share it with your peers who need to learn about OpenTelemetry.
Did you know you could consume data from remote JSON APIs into Prometheus? I can think of a number of different use cases for this. Nice example.
An overview of the most common trends in observability right now. Jibes with everything I’ve seen in this newsletter over the past year.
An insightful look at the bare minimum of metrics that service owners at Salesforce are expected to collect and monitor.
I love this article from Brendan Gregg on why we do (or don’t) choose certain products. Frankly, it feels like the making of a great checklist for any new potential vendor.
Scaling systems is the kind of challenge that most of us live for, but it takes experience to learn the pitfalls and patterns that save us time and money the next time around. It should be no surprise that so many of these considerations overlap with the observability domain.
A look at some of the differences between SRE and DevOps principles, with a particular emphasis on service levels and monitoring signals.
Reliability means something different to every company, but it’s critical to have a shared understanding of what that means. This manifesto from Delivery Hero is a fantastic example of how to drive consensus and set expectations among your engineering teams.
An excellent article from Salesforce engineering, covering their more popular design choices for building observable services.
Another fantastic article from Netflix engineers about building (and observing) systems at scale.
A really clever way of buffering up debug logs in AWS Lambda to avoid blowing up your CloudWatch budget.
Considerations for indexing your Elastic Stack logging services. There’s some good stuff in here, but it also reminds me why I happily paid the “Splunk tax” at my last gig.
PayPal engineers share their techniques for benchmarking Kafka and testing different failure scenarios before their services went to production.
A fairly exhaustive look at Grafana’s security features. Just note that most of its advanced capabilities are locked away in their commercial offerings.
This article introduces a new (to me) tool that looks super helpful for creating an inventory of all the software versions running in a container. I know that you can sort of do this with Prometheus already, but a standalone tool for audits makes a lot of sense too.
I consider myself fortunate to live in a rural area with fiber internet. If you’re one of the lucky folks with access to Starlink, here’s a quick tutorial for monitoring your connection with Prometheus.
If you’ve been around here for a while, you know I’m highly opinionated about writing alerts that are useful and empathetic towards the engineers who answer them. I love hearing from others who are just as passionate and thoughtful about
writing iterating on alerts.
I love these little weekend projects with dashboards and home automation (or in this case, home network monitoring).
So often we get hung up on the tooling and their limitations without really thinking about the problems we’re trying to provide solutions for. I love this collection of design patterns for building observability into our (micro)services.
How to think about Observability if your organization is stuck in an APM (or monitoring-only) mindset.
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor