View profile

SRE Monthly Roundup — April 2022

SRE Monthly Roundup
SRE Monthly Roundup — April 2022
By António Araújo • Issue #2 • View online
Hi friends 👋
Welcome to the second edition of the SRE Monthly Roundup. Our guest this month is Pedro Torres, Senior Director of Engineering at Salsify.

Pedro is a unicorn hunter: for the third time, he joined a pre-unicorn startup that has ended up reaching a one billion dollar valuation later on. Currently, he’s at Salsify helping brands such as Coca-Cola or Unilever navigate the complexity of retail systems worldwide. 
As an active community builder and public speaker, Pedro can often be found delivering talks on engineering leadership (check this one on DevDays) or organizing events for CTO Portugal
This month, I decided to record the conversation with our guest and share it here “podcast style”. It becomes obvious pretty quickly that Pedro is an expert at these things and I’m still a bit of a newbie. But hey, it can only get better from here!
Some of the topics we discussed: 
  • Being part of three startups that became unicorns: Farfetch, Talkdesk and Salsify
  • Remote-work and the rise of the Portuguese tech scene
  • Pedro’s main goal of being an enabler and unlocker of people potential
  • KPIs as an engineering leader
  • Operating without a dedicated QA function
  • Salsify’s stack: Ruby, AWS, Kubernetes, Kafka, and more
  • Managed services and the build Vs. buy dilemma
  • CTO Portugal community
The Roundup
/ a collection of interesting blogs & articles, news and stories
The best piece of content I’ve read in April. @acartine takes you through the implementation of SLOs at Klarna, the cultural and technical challenges they’ve found along the way, and much more. I feel like I’ll be sharing this one with colleagues and customers for a while now. 
Two SLO links in a row but trust me, it’s worth listening to the full episode of Grafana’s podcast. So many interesting stories about creating an SLO culture and the benefits that come with it. My favorite nugget — Tom Wilke: I’ve noticed that once you start giving people an interface where they can see performance of their SLOs, then within the organization, teams wanna be in that interface. They wanna see themselves in there. By @matryer and @tom_wilkie with guests @beorn7 and @MetalMaze
Incident.io know a thing or two about incident management and although I’m not fluent in Go, this blog by @paprikati_eng was a great read. If you’re looking for practical tips to reduce alert noise — without SLOs! — make sure you skim through it. 
Founding Uber SRE 15 min read
I know I’m late to the party but I’ve only discovered @lethain’s blog last week and after reading this one I’ve already bookmarked many other posts to go through over the weekend. This one has some incredible stories and learnings from scaling Uber’s team and infrastructure at an incredible speed. 
Even after a rough month with several disruptions to their services, GitHub kept their promise and published their usual monthly availability report. Sharing these helps companies earn trust and helps the community/industry implement some of the learnings included in the report. Hats off to you, GitHub!
Bonus: 
What’s coming
/ shortlist of events, meet ups, product launches
HugOps
Who’s hiring
From the detech.ai blog
Did you enjoy this issue?
António Araújo

A monthly newsletter to help you keep up with everything going on in the site reliability world, covering topics such as performance, scalability, security, DevOps, observability, engineering leadership, and more

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue