Thanks for joining us for another issue of Monitoring Weekly!
Monitoring News, Articles, and Blog posts
This was a huge week for fans of Graphite, as it finally crossed the 1.0.0 milestone after years of forked development and months of release planning and performance fixes. As a long-time user and maintainer of the project, I’m thrilled to see the core team and community rally together to get this one out the door.
For anyone with a background in statistics, “correlation” means something very specific. For those of us who might have “skipped” the stats class (er, “slept through” might be more accurate…), correlation often means simply aligning one or more time series together. This article gives a few examples of why that understanding of correlation can bite you and how to think about it differently.
Nope, it’s not an April’s Fools Day joke: Paessler really has released an Alex Skill to interact with PRTG. I think it’s high time for PagerDuty <> Alexa integration now…
The Honeycomb.io blog is chock full of monitoring gold and this post about logging is no exception. I think a great alternative title to this article could be “Logging Antipatterns” or perhaps “11 Ways You’ve Screwed Up Logging.”
VMware announced their intent to acquire Wavefront, a metrics and visualization SaaS offering. At face value, this acquisition reminds me of Rackspace’s purchase of Cloudkick back in 2010, an effort to bring sweeping telemetry improvements to their hosting service. It certainly sounds like VMware has similar plans for integrating Wavefront with their own Cloud services.
Stripe has launched a new thing, Increment, a long-form digital magazine about how teams build and operate software at scale. The first issue is something near-and-dear to our hearts at Monitoring Weekly: on-call. There’s six incredible articles in the first issue–seriously, they’re all great.
People often underestimate the amount of work that goes into writing time series databases. This article by a Prometheus developer dives deep into the design and shortcomings of Prometheus’s current v2 TSDB and explains how v3’s TSDB was redesigned to address these issues while scaling for future demand..
Kibana now has heatmaps and horizontal bar charts (and a few other visualization goodies). <3
There’s a nasty data loss bug in v5.3.0, but the workaround is pretty straightforward. Make sure to check if you’re vulnerable and implement the fix ASAP.
The Synthesize project released its own update to support the newest Graphite release. Synthesize is an installer (and uninstaller) for Graphite, making it easy for new users to get started with time-series on their existing systems or with Vagrant.
Icinga released a new Logstash output plugin, making it possible to fire off Logstash actions to Icinga 2. Supported actions for this release include check results, custom notifications, and managing comments and downtime.
Monitoring absence-of-data things (such as a backup job that didn’t run) has always been a huge pain. This tool allows you to easily monitor those sorts of things using the “dead man’s switch” approach.
Many of us (myself included) just assume that everyone in Ops should have full root access everywhere, but as the author points out, this isn’t strictly true and it’s probably a Bad Idea anyways. They’ve come up with a tool that programmatically gives administrator permissions to those on-call and removes it after their rotation ends. Pretty nifty solution.
Thanks for joining us, folks! If you like what you’ve seen, invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them our way by emailing us (just reply to this email).
See you next week!