Tons of great logging and SLO content this week. Oh, and a fresh stack of new job postings. Enjoy this issue and have a great day! 😎
This issue is sponsored by:
Manage incidents directly from Slack
Rootly helps automate the tedious manual work like creating incident channels, searching for runbooks, documenting the postmortem timeline, and more. Teams sized 20 to 2000 manage hundreds of incidents daily and save thousands of engineering hours a year within Rootly. Get started in <5min or book a demo to learn more and get Starbucks ☕ on us!
Articles & News on monitoring.love
Come hang out with all your fellow Monitoring Weekly readers. I mean, I’m also there, but I’m sure everyone else is way cooler.
From The Community
If you haven’t heard about Prometheus remote write, this is a gentle introduction to the functionality. We used this at my last gig and it offers a lot of flexibility (compared to traditional Prometheus scraping) for increasingly diverse deployment scenarios.
A little heavy on the memes, but a fun Kubernetes debugging story nonetheless.
Data-driven negotiation with SLIs, SLOs and Error Budgets
An impressively thorough look at everything related to SLOs and error budgets. I would set aside a good half hour to read (and re-read) this two-part series.
If you can afford it, I have no doubt that Splunk-Connect is a hella useful integration for aggregating your Kubernetes container logs.
It takes a lot of work to build, adopt, and maintain a healthy Incident Management program, but it’s worth the investment. This article is a nice introduction to many of the considerations and open questions you should start thinking about when developing your own IM strategy.
This might be the first article that successfully explained to me what Loki is all about. Excellent summary here, pass it along to your peers.
I know it may seem hard to believe, but Prometheus isn’t the only metrics system out there today. This engineer believes they have a case for choosing Telegraf with InfluxDB (has he heard of remote write, I wonder…).
See ROI on cloud and cloud native monitoring in minutes, free.
Ready to see insights on your cloud infrastructure workloads in minutes? The OpsRamp free trial makes it easy. Set it up with no credit cards or commitments, onboard your resources with our wizard, and use out-of-the-box or custom dashboards to get the metrics that matter. We'll even supply the GCP resources if you just want to see how it works. Get started today. (SPONSORED)
The first entry in a three-part series, Pinterest engineers walk through their transition to Apache Druid for analytics data. Although it’s not a monitoring or observability story in the strictest sense, I feel like the lines are beginning to blur between high cardinality metrics and analytics systems.
Monitoring Weekly readers are probably not the intended audience here, but it wouldn’t hurt to bookmark this one the next time you need to justify your paycheck to a pointy-haired boss.
Tons of useful SLIs and considerations in here for optimizing your own S3 usage.
Some nice performance improvements and bug fixes in this minor release. Make sure to read the release notes (duh), looks like there are some deprecated block formats.
“Splunk Connect for Kubernetes provides a way to import and search your Kubernetes logging, object, and metrics data in your Splunk platform deployment.”
“PREVAIL is a unique follow-the-sun virtual event devoted to IT resilience, performance, security, quality testing and Site Reliability Engineering.”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor