Busra is an SRE at Datadog where she works on technical & organizational challenges to make the platform more resilient. Her career spanned over eight years working in cloud infrastructure, platform engineering and reliability in various companies. She is especially passionate about incident management, platform automation and engineering culture. Outside work; she likes telling stories, taking photos and advocating for mental health.
Incidents are an inevitable part of our daily work, but they are often viewed as unpleasant disruptions to our routine. The very definition of an incident includes the words "unplanned" and "disruptive" in many places, which doesn't exactly make them sound like a barrel of laughs. But what if we could change our perspective and see incidents as exciting opportunities for building resilience?
Let's explore how incidents can help us not only improve our systems but also build more resilient organizations and become better engineers. How can we build resilience through outages? How can we support our teams in responding to incidents? How can we ensure that the learnings stick around in our organization for the long haul? My hope is to provide you with a fresh perspective on these questions and give you a renewed sense of excitement for the opportunities these events present.