This week marks a return to tracing and observability topics, along with a Grafana security update. Have a great day! 🌞
This issue is sponsored by:
Instant visibility into the health of your software
Modern monitoring tools for user-centric teams. Raygun gives you actionable, real-time insights into the quality and performance of your web and mobile apps so you can detect, diagnose, and resolve issues quickly. Their simple usage-based plans start from as little as $4 per month with unlimited apps and users. Try Raygun free for 14-days.
Articles & News on monitoring.love
It’s been amazing to see the community grow throughout 2021 and into 2022. We’d love to have you join us and share what you’ve been working on.
From The Community
Just when you thought Google Cloud could be trusted (lol), here comes a cautionary tale of debugging OTel spans in GCP.
How to think about Observability if your organization is stuck in an APM (or monitoring-only) mindset.
Following the previous theme, Razorpay shares the details of their own tracing and Observability journey. Really great stuff here.
If you enjoyed last week’s article about Delivery Hero’s Reliability Manifesto, you’ll love this follow-up from one of their engineers reviewing their use of golden signals for monitoring production services.
You might be surprised to learn that many companies employ their own dedicated Incident Management team(s). In my experience, the existence of a well-functioning IM team won’t guarantee higher reliability, but they can be very effective for getting process and systems buy-in across the organization.
If you’re a Prometheus or Kubernetes administrator, you’re probably already familiar with the sidecar concept. For everyone else, this article is a super quick primer (though I personally still love Sensu’s whitepaper on the topic).
This story really resonates with me. As an ex-SRE and Observability Engineer, it’s almost unavoidable to get pulled into any incident where engineers need help deciphering wonky behavior in a graph or metric. This is unsustainable, and the sooner you can identify and break the cycle, the better it is for everyone involved.
The SQL-powered observability backend
Analyze Prometheus metrics and OpenTelemetry traces together using Promscale + the power of SQL. Promscale is open source and built on top of PostgreSQL/TimescaleDB. Get the system insights you need with the technology you’re familiar with. Learn more. (SPONSORED)
More security fixes (medium severity) for everyone’s favorite open source dashboarding tool.
If you run a dedicated storage array, you might already be familiar with the Delfin project within the SODA Foundation. This article takes a look at deploying Delfin on Kubernetes and determining whether it’s also a good fit for monitoring native K8s storage.
A surprisingly thorough look at four of the more popular command-line utilties for debugging and troubleshooting Linux systems.
From Grafana Labs’ October 2021 EMEA meetup, a talk about how The Factory adapted their Grafana and Prometheus stack for a high-availability multi-datacenter configuration.
“delfin… is an an open source project to provide unified, intelligent and scalable resource management, alert and performance monitoring.”
Negotiating your AWS contract? Let us help. At The Duckbill Group, we’re on your side and we see dozens of these a year–more than most AWS account managers! We’ve helped negotiate everything from $3mm contracts to $650mm contracts and a whole slew in between. Check out our AWS contract negotiation services. (SPONSORED)
See you next week!
– Jason (@obfuscurity) Monitoring Weekly Editor