Thanks for joining us for another issue of Monitoring Weekly!Monitorama
Monitorama is one of the few conferences that livestreams the entirety of the conference for free, writing it directly to YouTube. While the video editors are doing their magic and breaking them into individual videos, the raw, unedited streams are available here. Fair warning: each one of these is 8+ hours long.
A major power outage in Portland’s downtown area took down the Monitorama venue, prompting incident response from the Monitorama organizers, complete with a status page and regular updates. Neat. (the incident response, not the outage--that part stunk)
Of course, no conference is complete without attendee recaps. I love the different perspectives and takeaways found in these recaps, so I’m linking all that I found--my apologies if I missed yours. These are submitted without further comment, as I think you’ll enjoy reading all of them.
Monitoring News, Articles, and Blog posts
The first in what’s looking like will be a pretty awesome series on implementing open-source monitoring. This article is exactly as the title suggests: Prometheus and Grafana, running on Kubernetes. You won’t find a super deep-dive here, but you will find a configs-included starter approach.
I love stories about the monitoring journey teams go through and the lessons they learn about their apps, infrastructure, and themselves along the way. This one is from the folks at Swissquote and is largely Graphite-focused. Also, 1.1 million metrics per second is nothing to sneeze at (everyone thinking “Graphite doesn’t scale” should probably settle down now…)
It’s a little hard for me to summarize this one because there’s just so many great points, but I’ll try: at the center of your infrastructure are humans, not servers. Build your infrastructure and apps with humans in mind or you’re gonna have a bad time (especially with on-call).
Detecting security threats in your infrastructure often comes down to knowing what to look for--signatures, as they’re called. The folks at Elastic walk us through setting up the WannaCry signature detection using the Elastic Stack.
I love finding parallels and inspiration in other fields. When it comes to improving on-call, there is perhaps no better industry to learn from than the medical field. The author, a Malaysian Medical Officer, makes a case for the various ways to improve the on-call experience in her industry. My favorite recommendation is the mandatory time off following an on-call shift.
Half the reason (maybe more?) any of us really care about monitoring is because it allows us to not only spot performance problems but also fix them and generally improve upon our situation. The folks at Heap Analytics ran into such a scenario and walked us through it all. Bonus points for real-world uses of flame graphs.
This looks like a slick tool. It creates a StatusPage.io-like page on your own infrastructure using AWS Lambda.
Thanks for subscribing to Monitoring Weekly, folks! If you like what you’ve seen, invite your friends and colleagues! As always, if you have interesting articles, news, events, or tools to share, send them our way by replying to this email.
See you next week!
- Mike (@mike_julian)
Monitoring Weekly editor