Lots of great “in the trenches” stories from a variety of engineering teams out there. Speaking of teams, we’ve got a stack of job postings this week… who’s looking for a new gig?! 💻📈💰
This issue is sponsored by:
What are the 3 trends in cloud-native and observability you need to know?
Tune in for an on-demand discussion with Chronosphere and analyst group ESG as we talk about the market challenges with cloud-native and observability strategies. You’ll learn the cloud-native adoption benefits and challenges, observability impact on business outcomes, and much more. Register here!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
This article speaks to me on a very personal level. I’ve built up Observability teams over the years; it’s not surprising that we share many of the same problems, but it’s always interesting to hear how we tackle (or prioritize) them differently.
Always interesting to read how other engineering teams work through really frustrating incidents.
A look at how Lyft instruments their Android mobile app to track CPU usage and monitor for performance regressions.
How Doctolib audited and continue to iterate on their noisy alerting behaviors.
A quick introduction to log aggregation concepts along with a fairly objective comparison of numerous commercial and open source alternatives.
Everyone’s favorite open source dashboard is out with another new release. Love to see the new alert grouping features.
If you’re using Apache APISIX already, there are a number of Observability plugins at your disposal. This article brings together a wealth of resources for getting started with the usual observability pillars (metrics, logs, and traces), and how to integrate them within your existing toolset.
CrashLoopBackoff + Four Other K8s Troubleshooting Tips Everyone Should Know
We all love Kubernetes but it can be a hassle to fix when things go sideways. In this webinar, we will cover some of the common problems that plague every Kubernetes user and show you how to fix them. Join us at 10am PT on Thursday, April 28 to add these tips to your troubleshooting toolbox. Save your seat here. (SPONSORED)
Great to see more companies talking about their incident management process publicly.
A helpful guide for friends or peers who might otherwise be new to monitoring on Google Cloud.
If you’ve been wanting to pull metrics out of the Google Cloud Monitoring API, this article has you covered. Props to the author for including a GitHub project with examples.
A look at Xendit’s pattern for monitoring outgoing webhook failures.
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor