A fun week of troubleshooting stories and some guides for automating your monitoring and observability tooling. Oh, and don’t forget that SLOconf’s virtual event happens this week. Enjoy!
This issue is sponsored by:
incident.io has joined your #general Slack channel.
👋 I'm here to sponsor this issue and automate your entire incident management process in Slack. You just focus on fixing the issue, I'll keep your team and status page updated, nudge you to take the important actions, escalate to the right person when needed, auto-generate your post-mortem and make sure follow-up actions are taken care of.
Install incident.io to your Slack, type /incident and I'll take care of the rest.
incident.io has left the chat.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
One of the better articles I’ve read on Distributed Tracing, with some helpful analogies and context to help newcomers develop a foundational understanding of this key observability principle.
Regular readers will know I love a good network troubleshooting story. This one might feel familiar if you’ve ever debugged MTU… with some new twists courtesy of AWS Transit Gateways.
How to get started with Datadog monitoring for most Python-based web API services (e.g. Gunicorn, Flask, etc).
How Braintree engineers diagnosed, iterated, and solved a thundering herd issue affecting their processor service.
A useful pattern for automating the creation of uptime checks (i.e. DIY Pingdom) in Google Cloud.
It’s unusual for us to call out one individual’s job change in this newsletter, but Brendan Gregg has had such a profound impact on our industry (with a particular emphasis on performance monitoring and debugging), making this a notable event. I’m very excited to hear that he’s joining Intel and will be continuing his work with eBPF and other open source projects.
I don’t agree with everything the author is presenting, but I laud them for at least considering other perspectives. Feels like bias might be involved in both sides, tbqh.
See how companies like DoorDash are “no longer flying blind” with increased visibility and reliability from Chronosphere’s end-to-end solution. Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Learn more here. (SPONSORED)
The first part of an upcoming series, this post explains why Observability matters to non-DevOps engineering teams.
Sysdig has started a new series to highlight important changes in Prometheus releases. This is a nice addition for those of us in the community who might not otherwise have time to parse the release notes.
A GitOps-friendly pattern for automating Grafana dashboards.
A quick look at the latest Grafana Tempo release, with an emphasis on its ability emit RED metrics by default for traces. Nice update all around.
“xpid gives a user the ability to “investigate” for process details on a Linux system.”
SLOconf is back again as a virtual event, taking place May 9-12 online. Looks like a lot of familiar faces, looking forward to this one.
Monitorama is returning to Portland, OR this summer. It looks like a return to form for one of our favorite events (ok, we might be biased). Hope to see you there!
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor