Production scaling and debugging stories, alerting on error budgets, distributed tracing, and a lot more this week. Enjoy!
What should we expect for the observability space in 2022? Chronosphere’s Co-founder and CEO, Martin Mao, breaks down his top 3 predictions for what’s to come in observability at the Predict 22 Virtual Summit. Catch the recap here!
Articles & News on monitoring.love
It’s been amazing to see the community grow throughout 2021 and into 2022. We’d love to have you join us and share what you’ve been working on.
From The Community
Always interesting to hear how companies at Netflix’s scale think about regressions, anomalies, and which statistics are used to drive their validations.
An overview of the most common trends in observability right now. Jibes with everything I’ve seen in this newsletter over the past year.
Not stricly a monitoring or observability story; the author provides context around what makes Kubernetes such a complex system to operate, which should inform how we think about observing it.
Error Budget Is All You Need
An excellent two-part series on SLOs, error budgeting, burn rates, and how to alert on them correctly.
A very approachable example for instrumenting your Node.js app for tracing.
Having worked in the monitoring space for so long, thinking about resilience and reliability is something I take for granted. Knowledge transfer and shared context of our systems is key to the resilience of our services, which should be a crucial trait in our monitoring systems.
A look into how Uber reclaimed a significant amount of system resources by tuning their Go garbage collection.
Join the Elastic Community Conference. Sign up and win a t-shirt!
ElasticCC is a free technical conference for the community, happening February 11–12. With stories and learnings from ELK to Elastic observability and security. Tracks in English, Portuguese, French, Korean, Mandarin and Japanese. Sign up now! (SPONSORED)
Great to see more practical examples of Grafana Loki for debugging actual production problems. Hoping to read more stories like this from the greater open source community.
A primer on the problems facing microservices, and why tracing can help diagnose transactions through a complex stack.
An illustrated matrix of the various use cases, limitations, and broader implications of distributed tracing across our industry.
“Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!”
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor