Happy holidays, folks. Monitoring Weekly is going on its annual holiday break for the next two weeks. I’ll see you all again in 2019!
This issue is sponsored by:
Distributed tracing helps you troubleshoot problems you don’t even know you have, and it’s only going to become more important as your software gets more complex.
Latest Articles on monitoring.love
Sourced from a fantastic list of resources on Twitter, I wanted to put all of this advice in one convenient location. Enjoy. (got something you think needs to be in this list? send it over!)
From The Community
Got Tomcat? Here’s some great stuff about what you need to know when it comes to monitoring it.
I feel like people tend to forget that there are real networks and real datacenters underneath our Kubernetes clusters and AWS/GCP/Azure/Heroku/etc. This article is interesting because it explores how SLAs work within a telco environment, touching on Carrier Ethernet technologies and, everyone’s favorite-to-mock protocol, SNMP.
What do you do when you’re beyond what ELK can reasonably handle? Well, you either fork over a mind-bogglingly large sum of money to Splunk, or you build your own solution. You can guess which option the folks at GO-JEK opted for.
I really like the depth and breadth of this article on instrumenting Golang and pushing it to InfluxDB. What caught me off-guard is the last section, though: TorfluxDB is a project that pipes metrics through Tor before going to the InfluxDB server, in an effort to anonymize the metrics. A typical use case might be a service you run where the maintainers also want usage data to improve the software.
Because the best teaser for this article is really the closing line, I’ll just leave this here: We need to put “metrics, logs, and tracing” back in their place: as implementation details of a larger strategy – they are the fuel, not the car. We need a new scorecard …
Sort of an MVP of tracing in Node.js.
As I’ve said before, everyone loves a good index. The folks at PagerDuty, armed with a data scientist and a mountain of raw data, have come up with an index to express on-call health of a given on-call responder. I really love this idea. Sadly, I can’t seem to find any data on what they identified the 16 factors to be.
I’m with the author: on-call duties should be paid for, separately from and in addition to, base pay. There’s also some other good points in the article too, of course, but I like poking bears with the “f*ck you, pay me” stick.
A huge congratulations to the team at Sensu for the incredible work in getting this shipped. Sensu Go is a massive improvement over the one we’ve all come to know and love, featuring some really awesome (and long-awaited) stuff: no more Redis or RabbitMQ (thank the gods), no hard requirement for config management, proper multi-tenancy, a versioned API, and much more. If you’re a Sensu user, you should check the release out and be sure to look at the upgrade notes.
It’s like if you had jq but without the headache of trying to remember the complex query format. Very neat.
As a Pythonista myself, this is awesome and very welcome. The built-in Python library is…meh. This one is decidely not.
This issue is sponsored by:
Got a neat product you think doesn’t get enough attention? An event you think everyone should know about? Something else entirely? Sponsorships are open to all product types and industries–not just those with a monitoring product.
Want your job listed here? Why not submit a post to the job board? It’s only $99/ad for 30 days.
See you folks in January, 2019!
– Mike (@mike_julian) Monitoring Weekly Editor