Welcome to the 4th edition of the SRE Monthly Roundup. Our guest this month is Chris Evans, Co-founder & Chief Product Officer at incident.io.
Chris started building what would become incident.io while working at Monzo, as the Technical Director for Platform and Reliability. At the challenger bank, Chris was asked by the CTO to start owning on-call processes. Tune in to hear about:
How the on-call project at Monzo led Chris and his co-founders to start a company
As per usual after an incident, Cloudflare released a detailed post-mortem analysis in record speed. It was a big one, impacting 50% of HTTP requests globally.
Another month, another article from the Mercari SRE team. Extra points for the SLO story with real use cases:
We began taking error budgets into consideration in making release decisions. We can check our error budget prior to release and make decisions accordingly (for example, we can decide not to make a release with major changes if a large chunk of our error budget has already been used up)
A monthly newsletter to help you keep up with everything going on in the site reliability world, covering topics such as performance, scalability, security, DevOps, observability, engineering leadership, and more