Value All the Way Down: Applying DevOps Where CI/CD Assumptions Break
2026-05-05 , Inspiration A/B

High-Performance Compute (HPC) environments break many assumptions behind modern DevOps and CI/CD. Simulation jobs are long-running, stateful, and expensive to fail. Feedback can arrive hours or days too late to be useful.

This talk demonstrates how applying DevOps principles in an enterprise HPC and simulation environment saves valuable time by shifting focus away from deployments and toward feedback loops. From error troubleshooting after the fact to failure prevention. Instead of forcing standard CI/CD patterns we adapted them to fit the constraints and delivered measurable value in reduced wasted compute, faster feedback, and better outcomes for engineers.


High-Performance Compute (HPC) environments are expensive to operate in part because a sizeable chunk of engineer and compute time is "wasted" via failed jobs. Engineers submit long-running jobs, wait hours or days, and only then discover failures or unusable results. By that point, the data is stale, decisions are delayed, and compute and engineer time is lost.

This talk is a real enterprise case study in applying a DevOps mindset to HPC by identifying and exploiting the true constraint: a reactive, instead of a proactive, feedback loop. We flipped the model from post-mortem analsysis to pre-submit validation and near real-time alerts so engineers could act with immediacy on timely, relevant data.

The result was a measurable decrease in wasted compute and faster recovery time. Hundreds of thousands of compute hours preserved, tens of thousands of engineer hours saved, and millions of dollars annually avoided in wasted reruns and idle decision time.

This talk is about how applying DevOps principles to the actual constraint, not the visible symptoms, allowed value to flow all the way down to the customer in an environment where DevOps may not even be part of the conversation.

Boris is a software engineer based in sunny Austin, TX working on large-scale engineering platforms that span High-Performance Compute (HPC), container orchestration platforms, and cloud systems. He cares about practical DevOps, improving feedback loops, and making complex systems less frustrating for the people who use them.