DevOpsDays Kerala

Kueue-ing Up Security for Multi-Tenant Cloud Infra at Scale
2024-09-28 , AI/SRE

Security is not a one-and-done task. It's important to maintain security consistently. There are a lot of open source tools out there to help with the security assessment of our infra but managing and orchestrating these tools at scale is a major pain point. Scheduling regular scans to maintain cloud security posture helps in achieving continous compliance.

Kubernetes is a scheduler and orchestrator at it's core and Kubernetes Jobs are a good way to help scheduling these security scans. However when you try to operate Kubernetes Jobs at scale by yourself, the limitations of this approach like overloading etcd, making api server slower, difficult to track the status of these jobs, random order of execution start popping up. We also realised that we were not able to control the usage and maximize the utilization of our cluster resources.

Enter Kueue – a k8s-native job scheduler specifically designed to address these challenges. Working seamlessly with the default Kubernetes scheduler, the job controller, and the cluster-autoscaler, Kueue provides a comprehensive batch system that helps us manage kubernetes jobs efficiently.

This session is going to dive deep into what are the challenges with native kubernetes jobs and job scheduler, how "kueue" helps with orchestrating jobs while solving these challenges and finally how Accuknox "kueue"s up security for multiple tenants at scale.


Batch processing on Kubernetes comes with many challenges - from flaky pod semantics to unique scheduling constraints to scalability concerns. Despite these challenges, hundreds of thousands of batch jobs are run on our Kuberenetes clusters daily in Accuknox. In this talk, we will dive into why we adopted Kubernetes for batch processing, its benefits, and how we were able to do so on a large scale.

Batch Processing has been actively been talked about in AI/ML scenarios, due to need of processing large sets of data for training and inferencing purposes. But with highly dynamic nature of workloads in this era, the security metadata generated in large and there's a need to understand how this can handled efficiently. Lots of design considerations need to go into this to maximise resource utilisation and handle multi tenancy securely and efficiently. This session is from an end user perspective of Accuknox of implementing Kueue based Job Management and talks in details about how we configure it for our usecase.

See also: Presentation

Barun likes hacking on low level stuff and fiddling around developer toolings. He currently is maintainer and leading the development efforts for KubeArmor, CNCF Sandbox project and works as a Software Engineer at Accuknox . He loves to speak at conferences talking about Open Source, Cloud Native and Security. He is a proud CNCF Ambassador. He has been associated and am actively mentoring with programs like Google Summer of Code and LFX Mentorship.

Website: https://barun.cc/
Past Talks: https://barun.cc/talks/
Github: https://github.com/daemon1024
LinkedIn: https://linkedin.com/in/barun-acharya/
Twitter: https://twitter.com/daemon1024/

Rudraksh is a generalist software engineer who likes digging into all things related to operating systems, networks and cloud. He has maintained a couple of open source projects in the CNCF landscape and currently maintains KubeArmor as a software engineer at AccuKnox. Other than computers, he likes to talk about music all kinds.