Whew, what a week. Just when you thought things were slowing down for the holidays, numerous security exploits hit close to home. No worries, we’ve got some upbeat news too… SysAdvent is back, a new Python-esque debugger from Facebook, and a handful of production troubleshooting stories. Enjoy!
This issue is sponsored by:
Work. Without the hard work.
LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Observability tools and data are good to have, but how to apply them effectively? A thorough collection of tips and considerations when troubleshooting systems.
Monitoring your Spring microservices with Prometheus and Grafana.
SLOs are one of those things that seem easy, but so few companies seem to use the effectively or consistently. Here are some examples why we’re doing it wrong.
How Facebook troubleshoots Linux kernels and userspace applications using the drgn debugger. Nice to see that the tool was written with scripting capabilities (specifically Python) in mind.
I’m thrilled to see SysAdvent return this month. This post makes the case for reliability being a first-class product feature. Might be a good resource to share with your company’s Product leadership.
How Snapp evolved from using monitoring tools to adopting an observability mindset and processes.
I love hearing how engineering teams think about scale and latency during the holiday seasons. Even if you’re not using Druid, there are some great takeaways for capacity planning in general.
The complete guide to error monitoring and crash reporting
Software bugs are frustrating for everyone. End users lose patience and leave, developers struggle to reproduce errors, and businesses lose customers without even knowing why. Learn why modern development teams need error monitoring more than ever. Read the guide. (SPONSORED)
How Airbnb engineers squeeze every last second of performance out of their web load times.
Grafana security fixes
This was a busy week for the Grafana team and community, with multiple security vulnerabilities and fixes released. I’ve compiled all of the relevant posts for you here. Please update your affected systems immediately, if you haven’t already.
- Grafana 8.3.2 and 7.5.12 released with moderate severity security fix
- An update on 0day CVE-2021-43798: Grafana directory traversal
- Grafana Agent 0.20.1 and 0.21.2 released with security fixes
- Grafana 8.3.1, 8.2.7, 8.1.8, and 8.0.7 released with high severity security fix
Speaking of security vulnerabilities, this one was a doozy. In case you missed the gloom and doom (and memes), a critical vulnerability affecting the Apache Log4j library was discovered. There are remote code execution exploits already in the wild. 😱
Decorate the Python function
Here’s a great resource for adding observability to your Python Lambdas. I’ve listed the three most recent posts, but the entire series is worth your time.
“drgn (pronounced “dragon”) is a debugger with an emphasis on programmability. drgn exposes the types and variables in a program for easy, expressive scripting in Python.”
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor