A very eclectic mix of articles this week, including some topical discussions on holiday surge preparation and on-call participation. Oh, and speaking of surges, we have another big wave of relevant job postings below. Enjoy!
This issue is sponsored by:
Work. Without the hard work.
LogicMonitor empowers teams to spend less time troubleshooting and more time innovating with fully automated infrastructure monitoring and log analysis. AI-powered intelligence automatically detects monitoring resources, surfaces anomalies, and provides root cause analysis across your entire stack. Leave the manual configuration, expensive hardware, and long hours of troubleshooting behind with a free trial of LogicMonitor.
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
Some last minute tips for SREs leading up to the chaos of a busy holiday season. Frankly, these are achievable goals that should already be in your planning, but it’s better late than never.
Being on-call is rarely fun, and many companies struggle to do it in a sustainable manner. But when is enough, enough? A tough-but-honest conversation about doing on-call better, or not at all.
OpenTelemetry Operator now supports auto-instrumentation in Kubernetes “when an Instrumentation CR is present in the cluster and a namespace or workload is annotated”. This sounds fantastic, and I’m anxious to see support for more languages (Java, NodeJS, and Python are currently supported) added soon.
This isn’t your typical monitoring article (it isn’t even really written as such), but it reminds me that there’s still a functional gap in how we monitor deployments with traditional Release Engineering tools. Admittedly, cardinality remains a problem for these types of short-lived jobs, but surely there’s a better way?
An objective comparison of container monitoring with Zabbix, ELK, and Prometheus.
An unexpectedly thorough write-up of one company’s transition to Amazon-managed Prometheus and Grafana. The author does a great job explaining why they chose to follow this path, as well as the drawbacks you’ll encounter.
A fun project for connecting a standard (non-IoT) grill to sensors and open source observability tools. This sort of thing is exactly why I’ve been nagging Camp Chef for access to the API for my networked pellet grill.
Improve your Core Web Vitals with this Definitive Guide
Pave the way to Core Web Vitals nirvana using this in-depth guide for developers. Learn actionable tips, best-practice advice, and a proven workflow to boost your scores, bolster your Google search ranking and enhance your end-user experience. Check out the Developer's Guide to Core Web Vitals today. (SPONSORED)
I don’t know how many of you are running NVIDIA GPU clusters, but if you are, this article may prove useful for tracking utilization metrics.
How to use Monika’s operators and helpers for more complex alerting possibilities.
This article covers a lot of familiar concepts, but it’s still a good read for folks who might otherwise be new to observability or monitoring.
A quick example for monitoring RabbitMQ in Kubernetes without relying on additional external plugins or libraries.
“The OpenTelemetry Operator is an implementation of a Kubernetes Operator.”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor