This has genuinely been a fantastic week for monitoring and observability content. I hope you enjoy them as much as I have. ☕📖📈
You might have heard discussions about the “three phases of observability.” But what do they really mean? Chronosphere is a SaaS cloud monitoring tool that helps teams rapidly navigate the three phases of observability. Learn more about Chronosphere and the three phases of observability here.
Articles & News on monitoring.love
It’s been amazing to see the community grow throughout 2021 and into 2022. We’d love to have you join us and share what you’ve been working on.
From The Community
If you’ve been around here for a while, you know I’m highly opinionated about writing alerts that are useful and empathetic towards the engineers who answer them. I love hearing from others who are just as passionate and thoughtful about
writing iterating on alerts.
Roblox has released a very thorough and insightful postmortem of their 73-hour outage dating back to October 2021. It would be easy to throw stones (e.g. the circular dependency between telemetry and Consul) but IMHO they deserve props for publishing this excellent analysis of their recovery efforts and the pairing with Hashicorp engineers.
A great example of how monitoring and observability tools provide the visibility for application engineers to mock tests and design load test scenarios that accurately represent production use.
A quick look at how Harry’s has defined their incident response. You know, in case their production services have any close shaves with an outage. 😜
A dive into Uber’s in-house engine for capacity planning and predictive, automated scaling. Very interesting read.
Metrics are fine, but it can be helpful to think of them in terms of the story they’re trying to tell you about the overall health of your services or application stack.
Join the Elastic Community Conference. Sign up and win a t-shirt!
ElasticCC is a free technical conference for the community, happening February 11–12. With stories and learnings from ELK to Elastic observability and security. Tracks in English, Portuguese, French, Korean, Mandarin and Japanese. Sign up now! (SPONSORED)
With the rise of cloud computing, network monitoring isn’t nearly as ubiquitous as it once was. Still, it never hurts to brush up on something as imminently useful (and confusing) as SNMP. Could be a fun weekend project to start monitoring your home router with Prometheus and Grafana.
An introduction and demo of the Pixie project, a modular open source project that supports monitoring Kubernetes clusters out of the box.
This isn’t strictly monitoring related, but I love reading scaling stories where it’s clear they couldn’t have told their tale without the use of our tooling. 😁
There are a lot of alternatives missing here, but it’s a good starting point if you’re curious about the ecosystem of serverless monitoring services.
In case you’re one of the three people actually using Google Chat and want to send your Google Cloud Monitoring alerts there. Kidding aside, this looks like a useful example for hooking up your own custom alerts pipeline.
“Pixie is an open source observability tool for Kubernetes applications. Use Pixie to view the high-level state of your cluster (service maps, cluster resources, application traffic) and also drill-down into more detailed views (pod state, flame graphs, individual full-body application requests).”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor