View profile

SRE Monthly Roundup — June 2022

SRE Monthly Roundup
SRE Monthly Roundup — June 2022
By António Araújo • Issue #4 • View online
Happy Friday 👋
Welcome to the 4th edition of the SRE Monthly Roundup. Our guest this month is Chris Evans, Co-founder & Chief Product Officer at incident.io.

Chris started building what would become incident.io while working at Monzo, as the Technical Director for Platform and Reliability. At the challenger bank, Chris was asked by the CTO to start owning on-call processes. Tune in to hear about:
The Roundup
As per usual after an incident, Cloudflare released a detailed post-mortem analysis in record speed. It was a big one, impacting 50% of HTTP requests globally.
I’m a newbie so I like this kind of articles. It goes through the principles and key questions to take into account when setting up your monitoring.
Another month, another article from the Mercari SRE team. Extra points for the SLO story with real use cases:
We began taking error budgets into consideration in making release decisions. We can check our error budget prior to release and make decisions accordingly (for example, we can decide not to make a release with major changes if a large chunk of our error budget has already been used up)
Bonus: 
What’s coming
HugOps
This month we’re sharing #HugOps with:
Google Cloud — June 14th
Meta — June 7th
Block — June 1st
Zapier — June 12th
Amazon — June 13th
SoundCloud — June 13th
Reddit — June 17th
Who’s hiring
_____________
And that’s it for this month! What have I missed? Tell me on Twitter or [email protected]. See you next month!  
Did you enjoy this issue?
António Araújo

A monthly newsletter to help you keep up with everything going on in the site reliability world, covering topics such as performance, scalability, security, DevOps, observability, engineering leadership, and more

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue