Abhinav Dahiya
I'm a Staff Software Engineer at Lyft in NYC, where I lead a compute infrastructure team managing Kubernetes on AWS and supporting ~1000 services across infrastructure, data platform, and product. Previously, I was a Principal Software Engineer at Red Hat, where I built Kubernetes control-plane components and installers used to manage fleets of OpenShift clusters across cloud and on-prem environments.
Session
We migrated our production Kubernetes clusters from Cluster Autoscaler to Karpenter, but with thousands of nodes and hundreds of services, unknown problems were inevitable. This talk covers how we built a rollout strategy where "pause" and "rollback" were as routine as "continue": explicit go/no-go criteria, staged learning with hand-picked canary batches, and tiered rollout with gated alarms. The result: double-digit cost reductions, and a pattern we've reused for many more high-risk migrations.