2025-08-27 –, Potter Auditorium - Kenneth C Rowe Management Building
The Kubernetes API is awesome and so tempting to use, especially when building Observability Solutions. Nobody wants to just get raw IP addresses and ports in their network or request telemetry, it’s much better to see your pod and service metadata. But what’s even better is that getting information about all the nodes in your cluster can help you produce amazing service graphs.
This talk is a story of how we took down the Kubernetes API in our biggest production cluster at Grafana, by deploying observability tools which make heavy use of the Kubernetes API.
As the number of observability tools proliferates and grows, so does the use of the Kubernetes API in your clusters. Each observability tool gathers useful information from the Kubernetes Node/Cluster to connect to their exported telemetry, which can cause a flurry of activity that can be a challenge even for the best designed Kubernetes API auto-scalers. Knowing what to expect and how to overcome these challenges can be a “life-saver”, especially since some of these problems only show when you deploy these information thirsty tools on the “real thing”, i.e. your production cluster.
We’ll show you the techniques we used to avoid repeating our mistakes, by applying configuration changes and building services which helped us shield the Kubernetes API from the information thirsty observability tools, while keeping the functionality intact. We'll walk to various practical aspects of how to build your services in a way that avoid overwhelming the Kubernetes API.
I've worked as a software engineer for more than 20 years, mostly in the field of compilers, managed runtimes and performance optimization. Most recently I'm working on low level application instrumentation with eBPF at Grafana Labs, building an OSS tool Beyla