Before we get into the articles, I want to take a moment to thank everyone for their support. It’s been a lot of fun bringing you this newsletter each week, and it sounds like you’re enjoying it as much as I am. Thank you and enjoy an inbox chock full of great articles from this past week!
This issue is sponsored by:
Start incident response with context to all your alerts in one view
Moogsoft speeds up incident response with dynamic anomaly detection, suppressed alert noise, and correlated insights across all your telemetry data. Go from debugging across multiple tools, screens, and dashboards into a single incident view so you and your teams can take a more proactive approach to reduce MTTR. Sign up for the Moogsoft Free community plan today!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
By now you’ve heard all about the massive Facebook (and related properties) outage back on October 4. This is a follow-up post from Facebook providing more details on the cascading failure that led to a global shortcage of cat memes.
In response to the Facebook postmortem, I joined Mandi Walls, Pete Cheslock, and Joshua Timberman on Twitch to talk about the event, how fragile the Internet truly is, and why power tools are still a vital part of our disaster recovery plans. Hot takes galore.
Looks like another VC-backed Observability startup wading into the mix with Parca, an open source “continuous profiling” system. Reminds me a bit of Riemann with a bunch of Prometheus-isms.
Stories like this one remind me why I love the Netflix tech blog. Tons of great thinking around deployments and monitoring for regressions in A/B testing and canaries.
Looking to autoscale your Kubernetes pods based on Prometheus triggers? Here you go.
Great to see enhancements to the date picker and performance improvements to the image render. Lots of good stuff in this release. Oh and if you missed it, make sure to update your Grafana deployments for this recent CVE.
I love stories about refactoring and the challenges of retooling a system in motion. Although this post has nothing to do with monitoring in the traditional sense, it should strike a chord with anyone who’s had to upgrade storage for metrics or logging systems.
I shouldn’t need to say it, but security is everyone’s job. If you’re running Kubernetes you should check out this collection of security scanning tools. Props to the author for including sample output from each.
Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Teams at enterprises, large cloud-native, and mid-market companies around the world trust Chronosphere to help them operate scalable, highly available, and resilient applications. Learn more here. (SPONSORED)
I’m not surprised to hear folks are experiencing more burnout as a result of COVID-19, but there are some interesting datapoints regarding MTTR and MTTA trends over the past few years.
A friendly introduction into the most common open source observability tools. Share with your non-observability-SME friends.
If you’re a Prometheus user but considering a move to VictoriaMetrics, this writeup covers some of the important differences and incompatibilities between PromQL and MetricsQL, respectively.
There are probably easier ways of tracking your Kubernetes spend, but I’m sure they’re not free. At the very least, you can try this out use it to justify a commercial alternative.
Yes, their conclusion is for you to buy their observability service, but they still make some valid points about data ingestion and storage along the way.
“Continuous profiling for analysis of CPU, memory usage over time, and down to the line number. Saving infrastructure cost, improving performance, and increasing reliability.”
Ready to lower your AWS bill? Now might be the perfect time for an AWS Cost Optimization project with The Duckbill Group. The Duckbill Group aims for a 15-20% cost reduction in identified savings opportunities through tweaks to your architecture–or your money back. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor