Latest on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
LightStep is one of the new breed of tools out there I’m excited about. Designed with modern, high-scale, high-traffic architectures in mind, LightStep makes it easy to spot, diagnose, and solve performance issues. Check it out here. (SPONSORED)
Probably the most common question I received when I told people I was writing a book about monitoring was, “Have you read James Turnbull’s book?” I’m putting that to rest with a delightful conversation with James Turnbull on a variety of topics, including which of his own books is his favorite, some not-so-subtle digs at Kubernetes, and why James thinks DevOps is dead.
A pretty in-depth, detailed guide on Elasticsearch, covering how it works, managing it, how to monitor it, and more.
tldr: Prometheus, Grafana, and some fun apropos jokes
Because everyone can generally do with a refresher course on the Golden Signals from time to time.
These are really great examples of good and bad visualizations with Grafana. Highly recommend you read this and take notes.
This is super fun and very well-designed. I would absolutely love to have more scenarios built out, especially ones that are trickier.
Someone remarked recently that monitoring SSL certificate expiration is probably the quintessential job of Nagios… and you know what? I didn’t even argue. So, happy to see doing this with Telegraf was so easy.
The folks at Raygun set out to learn why, interviewing the executive leadership at Xero, Pushpay, and Vend to find out what’s really going on and how they think about engineering effort and software quality. (SPONSORED)
I love good visualizations. I also love Python. This article gives me both. <3
This is a super neat tool for making Grafana annotations a lot easier to create.
I’ve linked to a few of these before, but there’s some new ones I didn’t know about. They might give you some ideas for dashboard organization.
A tale of SLOs at SoundCloud. Very useful stuff in here.
I absolutely love this article for two big reasons: 1) I disagree with a few of the main points, and 2) the author is clearly way smarter than me. Highly recommended read.
A bit of a teaser: “Tracing might still remain something that, once deployed, doesn’t unlock enough value to be of any practical use in the most commonly used debugging scenarios.” 100% agreed – last time I suggested to a vendor that tracing wasn’t terribly valuable, I ended up in an hour long debate. Glad I’m not the only one.
Following up on the conversation Cory and I on Real World DevOps, here’s his article on the foundations of dashboard design for operations.
GitPrime’s new book draws together some of the most common software team dynamics, observed in working with hundreds of enterprise engineering organizations. Actionable insights to help you debug your development process with data. Get Your Copy. (SPONSORED)
From the article: “This post is about how “logs vs. metrics” is a false dichotomy, and how thinking in this binary prevents us from seeing simpler ways to monitor our systems.”
The title may come across as clickbait-y to some of you, but the truth is that the state of most of the industry is exactly where the author is coming from. By virtue of being on this newsletter, you’ve self-selected into a group with a higher level of awareness and interest in monitoring, but the truth is that while the state of monitoring has come a long way, we’ve still got a long way to go.
I won’t ruin the punchline for you, but you would be shocked how often this occurs even in multi-million dollar companies.
This article starts off with some musings about overly-complex software architecture, but it starts to get really good about halfway in when the stuff about Kafka shows up. Stick with it; I promise it’s worth the read.
The folks at Farfetch discuss their monitoring journey and current stack too. TL;DR: Grafana, Thanos, Prometheus, Alertmanager,
Following on the heels of Netflix’s latest documentary on Chernobyl, the author of this article relates the incident to software engineering. Also, lots of really interesting stuff about the incident that I didn’t know.
A new book on Prometheus is out and available for purchase.
Join DevOps expert and CEO of Blue Matador, Matthew Barlocker, for a CloudWatch Guided Tour Webinar on either July 25th or July 31st. You’ll learn about CloudWatch concepts, alarms, metrics, best practices, and more. Save Your Spot. (SPONSORED)
The folks at THRON discuss their monitoring and current stack. There’s some good stuff in here about instrumentation frameworks (RED, USE, Golden Signals).
Because everyone loves dashboards on big ass TVs.
Everyone loves a good framework and this is a super good one focused on alert design.
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor