I hope you’re all doing well and hanging in there as we enter arguably the busiest time of the year for many (and quietest for everyone else). Lots of great content this week around Prometheus, alerting, and performance engineering. Oh, and an exciting new PubSub project from Pinterest. Enjoy! ❄⛄❄
This issue is sponsored by:
Work. Without the hard work.
LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
How do you monitor endpoints on Kubernetes? This story demonstrates it with blackbox-exporter, while explaining the differences between using probes versus the Prometheus operator.
I don’t think any single article can address the variety of cultural and systemic organizational issues that can lead to Really Bad Alerting Practices™. Nevertheless, this one does a solid job covering many of the aspects within our direct control and influence.
Considerations for choosing the right managed monitoring or observability solution (or not at all).
A very interesting (and unique) story about how Salesforce replaced their bottlenecked CheckMK alerting infrastructure with AWS components.
Some excellent tips on how to leverage Prometheus labels more effectively in Grafana. Bonus points to the author for demonstrating the Prometheus labels API.
Pinterest has announced MemQ, their new “low cost, cloud native” PubSub alternative to Kafka. It hasn’t been open sourced yet, but it supposed to be. This could be a huge win for teams moving a lot of data, particularly observability systems. Excited for this one.
Another CVE and security fix for a high-severity access control vulnerability in Grafana. Versions between 8.0.0 and 8.2.3 are affected and should be upgraded to 8.2.4 at the earliest opportunity.
Instant visibility into the health of your software
Modern monitoring tools for user-centric teams. Raygun gives you actionable, real-time insights into the quality and performance of your web and mobile apps so you can detect, diagnose, and resolve issues quickly. Their simple usage-based plans start from as little as $4 per month with unlimited apps and users. Try Raygun free for 14-days. (SPONSORED)
Increment has released another issue of their excellent digital magazine. And while this issue isn’t specifically about monitoring or observability, tech debt is a thing that affects all engineers. I strongly encourage you to read and share this fantastic article.
Not to be outdone by the MySQL monitoring article in issue #143, this week brings us a look at the “top 10” PostgreSQL metrics you should be monitoring. Everyone loves a list.
A look at how Observability and related technology pillars join to form a giant robot for the sole purpose of protecting engineering performance at Business Insider. Seriously though, I’m glad they care about the user experience for folks who can access their paywall news sites.
If you’re like me, Grafana Agent was probably one of those things that you used without realizing it. This story explains how Grafana Agent came into existence, and why the Prometheus folks have agreed to adopt this functionality natively in Prometheus.
“Prometheus exporter for PostgreSQL server metrics.”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor