This issue is sponsored by:
Ship faster because you know more, not because you’re rushing. Get actionable insights from 7 million commits and 85,000+ software engineers, to increase your team’s velocity. Free Guide
Latest Articles on monitoring.love
Ever thought hard about your company’s observability strategy and the challenges you’re facing? What about if your company spanned 70 countries, 90,000+ employees, and you were a bank? My guest certainly thinks about this regularly. In this episode, I speak with Greg Parker, the head of the Enterprise Monitoring Services team at Standard Chartered Bank about what it takes to design and implement a global monitoring strategy in a complex environment.
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
There’s a new O’Reilly ebook out, sponsored by the folks at Humio, about chaos engineering and observability.
For those of you working on high-volume backend systems, you’ll like this article from the folks at Segment.
A great read about exactly what it sounds like.
A look under the hood of some interesting Salesforce engineering.
There’s some gems in here, but my personal favorite is this one: “Don’t litigate incident severity during the call. It’s a waste of time. By the time you’re done discussing whether it’s a SEV-1 or SEV-2, it will definitely have become a SEV-2. Best practice: If you can’t decide whether it’s a SEV-1 or SEV-2, always assume it’s the higher severity option and move on.”
The final part in a three-part series on Holt-Winters predictive functionality in InfluxDB.
From my good friend Thai’s Resilience Roundup: “In this study, a lot of employees said that the accidents happen anywhere from 0 to 5 times a year, but at the same time, almost everyone said that small accidents or incidents were happening all the time. The operators in this company had normalized risk to such a degree that things like getting burned or getting acid in their eyes counted to them as only a minor incident.”
The story of Knight Capital is an interesting one (which you can read about here), and this thread by John Allspaw points out some hypocrisy/hindsight bias among the peanut gallery as it relates to both Knight Capital’s story and the NY Stock Exchange halting in 2015 for similar reasons.
A whole bunch of web performance metrics (and what they mean) and tools for collecting+analyzing them.
From the article, “UltraBrew Metrics can operate at millions of requests per second per JVM without measurably slowing the application down. We currently use the library to instrument multiple applications at Verizon Media, including one that uses this library 20+ million times per second on a single JVM.”
Someone had the great idea of set up nodes in a bunch of AWS regions and measuring latencies between them. Very cool.
This issue is sponsored by:
Distributed tracing helps you troubleshoot problems you don’t even know you have, and it’s only going to become more important as your software gets more complex.
The CFP is now open for Datadog’s DashCon.
See you next week!
– Mike (@mike_julian) Monitoring Weekly Editor