Thanks for joining us for another issue of Monitoring Weekly!
Monitoring News, Articles, and Blog posts
A wonderfully-deep look at when you might want a metric versus when you might want a log, the role of unit tests versus monitoring, structured versus unstructured logging, whitebox versus blackbox metrics, and how all of this fits nicely into the umbrella of “observability.”
One of the core requirements for stable and scalable microservices is good monitoring and reliability. Circuit breakers are a common tactic for increasing the stability of microservices, and this article looks at how you might monitor such behavior with statsd and the gRPC framework.
Distributed tracing has become a hot topic in this past year, but sadly, there’s not a lot of people talking about how they’re actually using it. This article gives a short demo of how to actually use Zipkin, an open-source distributed tracing tool, to instrument a Python app.
Another demo app of using distributed tracing, this time with OpenTracing and a much more extensive application. This is a big article with a lot of really great stuff, so grab your coffee and hang on.
I think this serves as a periodic reminder that the things we do aren’t just used for behind-the-scenes, boring, commercial use-cases, but often have world-altering and life-changing uses too. I’d really love to see some more of the technical implementation details behind this one.
Think you know what %CPU in `top` means? Prepare for a new perspective. Seems we all may have been tuning for the wrong constraint all along.
PagerDuty has been on a roll these past few months with their emphasis on promoting and improving incident management practices for the community, and this latest improvement is (to me, at least) a long-time coming: JIRA-PagerDuty first-party integration. Gone are the days of hacky scripts to bridge the two.
Speaking of which, yet more really interesting and useful features from PagerDuty for tackling your incident management process improvements.
Ah, the age-old “push vs pull” monitoring argument. The folks at Influx have recognized that it’s not a clear-cut answer, and in response, have added some useful pull-based monitoring capabilities to Kapacitor by integrating some code found in the Prometheus project.
Performance Monitoring Counters (PMCs) are now available from AWS EC2 instances (dedicated instances only). Truth be told, I had no idea what these were, having not done a lot of work on performance analysis and tuning at such a low-level. Even if you don’t either, this is still a really interesting read.
Are Algorithms Better Than Humans?
A caution against over-using algorithms in decision-making on data. It’s a little bit meta for our purposes, but given that a lot of monitoring tools are headed in the direction of automated anomaly detection using machine learning approaches, I think the argument made here is well-taken.
For something more on the fun side, this tool allows you to monitor...yourself! Best of all, it’s open-source and can be self-hosted.
Elastic Stack 5.4.0 released
A whole bunch of new features and bugfixes from the folks at Elastic.
Elastic Stack 6.0.0-alpha1 Released
If you like staying on the cutting edge, Elastic just dropped the 6.0 Alpha release. One of the more interesting bits is the capability to now upgrade from one major release to the next (5.x -> 6.x) without bringing the entire ES cluster down. That’s slick.
consul2dogstats, with Dimensional Tagging
A new tool to collect service health data from Consul and publish to Datadog.
Events & Meetups
(Do you have a monitoring-related meetup/event you want to announce here? Just email me!)
If you’re in San Francisco, be sure to drop by this month’s SF Metrics Meetup!
Thanks for subscribing to Monitoring Weekly, folks! If you like what you’ve seen, invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them our way by replying to this email.
See you next week!
- Mike (@mike_julian)
Monitoring Weekly editor