Hey folks, welcome to another installment of Monitoring Weekly! Did you write something about monitoring recently? Maybe got an idea rolling around in your head? Send it on over and let the community learn from you. 😀
Monitoring News, Articles, and Blog posts
The Hidden Costs of On-Call: False Alarms
I’m not sure how I missed this, but here’s a great talk from the recent LISA17 conference about mitigating false alarms and the true costs of them to your staff and company.
Part 2 of the InfluxDB Internals 101 series is out, this time covering how queries are handled under the hood.
I really like reports like these as they are a fantastic reminder that we don’t do monitoring for the sake of monitoring, but because it helps grow the businesses we work for. A slow website makes less money and nowhere is that more apparent than Black Friday/Cyber Monday for online retailers.
This post is more of a Honeycomb customer success story, but I’m including it because it illustrates a problem that is important and, thanks to Honeycomb, is being talked about a lot more: high cardinality data and troubleshooting. This post illustrates the value of high-cardinality data really well. I’m looking forward to more tools working on solving this problem better.
Something we haven’t seen much of around here is Azure and Windows-based infrastructure. The author takes us through an overview of Azure’s monitoring toolset and how it all fits together.
A Sensu community member wrote about a couple options for collecting metrics via Sensu and sending them to a TSDB. I like this for its straightforward, quickstart-style explanation.
Full disclosure: My company, Aster Labs, is a Sensu Partner. I received no consideration, financial or otherwise, for including this post.
A much-needed effort at defining terminology and breaking down the overloaded “monitoring” term. The author goes into more detail, but I’ll include the TLDR here because it’s so good:
“Monitoring is the process of observing systems and testing whether they function correctly. Analytics is the process of turning data (usually behavioral data) into insights. Observability is the property of a system that supports analytics. Diagnostics is the process of determining what’s wrong with a system, and also relies on observability. Root cause analysis is corporate mumbo jumbo.” – Baron Schwartz
Most “monitoring the coffee machine” or “monitoring the keg” posts are pretty boring, but in this case, the folks at Hosted Graphite have opened up the coffee machine and busted out the soldering iron. Props for dedication.
James Turnbull is writing a new book and this time on Prometheus. If you’ve ever had the pleasure of reading one of his many books, you know how great it’s gonna be. Pre-orders are open now–just bought mine.
See you next week!
— Mike (@mike_julian) Monitoring Weekly editor