DevOpsDays Philadelphia 2025
Cybersecurity and AI are colliding in ways that DevOps teams cannot ignore. This talk examines how AI expands attack surfaces with threats like model poisoning and prompt injection, while enabling defenses via predictive threat intelligence and automated incident response. We address the critical skills gap between cybersecurity and AI engineering by providing a practical upskilling roadmap covering LLM architectures, adversarial ML, and secure AI CI/CD pipelines. Attendees will gain a structured view of emerging hybrid roles at the intersection of DevOps, security, and AI, and actionable steps to future-proof their careers in this dynamic landscape.
Not every team needs a full-blown internal developer platform—but every team needs something to reduce friction. In this talk, we’ll walk through three levels of self-service maturity: the IaC factory (where platform teams write most of the infrastructure), the template/module model (where platform teams publish reusable components), and the full IDP model (where app teams provision resources through UI or CLI with guardrails). We’ll also break down what actually goes into a platform—including foundational infrastructure, shared services, workloads, and developer experience—and how to right-size your approach without overbuilding. Whether you’re just starting out or trying to improve adoption, this talk will help you map where you are and where you should (or shouldn’t) go next.
Security boils down to trust. Trusting that the code will do what is expected and is free from vulnerabilities. Trusting that the entities interacting with our data and resources have the right to access those resources. Our current approach to both human and non-human access uses the same basic flawed pattern: long-lived credentials.
This approach to trusted access does not take into account who or what is requesting that resource. These secrets, which quite often leak, are an attacker's best friend and are how attackers think about getting into and moving throughout your system.
What if instead of simply asking for a security key or credential to gain access, our applications, workloads, and resources asked "Who are you and how can you prove that?" Humans can move towards leveraging our non-changing characteristics, like biometrics. But what about machines? Especially in the world where pods and workloads last for only hours or days?
Attend this session to:
- Better communicate about why we must do things differently and soon
- Learn how the open-source software community has looked at addressing the identity problem
- Understand what commercial options are available
- Map a path away from the world of long-lived credentials
The future of identity and access management is the future of security, IT, and, ultimately, business resiliency.
- Why Git Clone performance is critical
- The default is very heavy
- Yet, many things do not require the default full history
- It affects the cost of the entire chain right back to centralized remote Git services
- End point sizing
- Network sizing
- Git cloning requires programmatic response from the server level - so large workload requests drive the performance hard.
- Garbage collection and maintenance on Git clients and servers
- When is Git Clone performance critical (things that are on the rise)
- special uses cases such as scaled CI,
- remote dev environments
- scaled monorepos, polyrepos
- Binaries in repos
- GitOps "lightweight" manifest / config retrievals
- Why is Automated Benchmarking Especially helpful?
- Works in many buried environments
- Allows testing of optimizations to see if build or other processes are compatible
Earlier this year I led TCG's technical team for a competitive real-time development challenge vying for a $40 million contract with the Department of Treasury. What began as a seemingly simple "one-day code challenge" rapidly devolved into a month-long race to prepare the Release One build needed to just begin the challenge day. Our final solution featured a full DevOps pipeline, Terraform deployment, multi-region failover Kubernetes infrastructure, and a comprehensive web application with AI image processing. This was all delivered under immense pressure within a one month schedule and limited customer access.
This isn't theoretical; it's a raw, honest look at real-world challenges. We'll delve into the critical, sometimes painful, lessons learned about DevOps principles and Agile anti-patterns that surfaced under fire. I believe these in-person live coding and technical assessments will become increasingly common in contract competitions, especially as AI blurs the lines of expertise when presented via written proposals.
An alert came in, waking you from a dream. What to do? Is it a vulnerability that needs immediate attention? Or just a flaky script? Alerts come in all sorts of frustrating shapes and sizes, but sadly not enough of them are worthy of your attention. There’s many ways to solve this: how much AI do you want here? Lots! Great, we’ll do that. Let’s explore ways to make alerting more helpful, more useful and more deserving of your precious time and attention.
Ablative Resilience
Numerous developers swiftly write and launch code in an agile environment, postponing secret management for later. A developer might opt to temporarily hard-code the secrets, and, upon merging the final version with the main branch, eliminate the secrets and transition to more secure alternatives, such as retrieving the secret from them. Regrettably, individuals err, and frequently those secrets are overlooked, hidden within the code, and missed during code review, ultimately ending up merging code into the main brach. The most obvious place to start scanning for secrets is in code. Securing the code and automating the scan could be the right solution to avoid any human error.
In the race to deliver software, bigger often feels better—but it comes at a cost. This talk champions Small Batch Delivery, a practice that streamlines development by shrinking the size of changes we ship. You’ll discover how small pull requests reduce risk, improve code quality, and keep teams in a state of flow. We'll dive into the ripple effects of bloated PRs, the psychology behind fast reviews, and why this isn't just a dev tactic—it's a cultural mindset shift. If you're ready to ship faster with less stress, it's time to think small.
DevOps has a notoriously steep learning curve. Getting started in the field can feel like being dropped in a foreign country without the ability to understand anything about the language.
A language is more than just the syntax and semantic rules of the words themselves. It also encompasses the shared culture of the speakers. With the proliferation of programming languages as well as the deeply held cultural beliefs of the community, it's easy to see that learning DevOps is like trying to learn a foreign language.
I will review five foundational hypotheses from the field of Second Language Acquisition and relate these hypotheses back to the world of DevOps. DevOps practitioners, trainers, tool builders, and learners should all come away with useful insights to apply to their practice.
What are the forward-looking career paths in DevOps, and how do we successfully navigate them? If you and your organization aren’t gearing up in these critical areas, you need to start planning now. In this talk, we’ll take a look at three critical areas in DevOps, the emerging trends in each, and a way for you and your team to stay meaningfully engaged as they develop.
Eric Snyder is a Senior IT Manager at the University of Pennsylvania. His 30-year tech career spans the gamut from programming to broadband network installation to leading teams and projects building CI/CD pipelines for on-prem, cloud-based and serverless execute environments. He currently manages the team supporting communication and collaboration services at Penn, including video production, streaming video services and enterprise-scale SaaS solutions -- which now include OpenAI ChatGPT and Microsoft Copilot Chat.
Writing SQL slows everyone down. Non-technical users can’t, data teams won’t, and leadership waits. While commercial AI-powered tools promise a solution, most are pricey, opaque, and allergic to your reporting requirements. This Ignite talk presents an open-source Text-to-SQL chatbot that prioritizes transparency and user control. It combines advanced prompt engineering & guardrails to reduce hallucinations and ensure generation of reliable SQL queries. It uses an evaluation framework to assess performance by checking syntax accuracy, schema awareness and robustness to ambiguous user inputs. You’ll walk away knowing what works, what breaks, and why building your own AI assistant might just be your smartest move. Query load is not a career path. Offload it to the bot.
We all know investing in developer experience is a good call...but how do you really know if those investments are working? Traditional DevOps metrics? Sure, they help. But now AI is everywhere, promising to save the day. So how do you measure if AI is actually doing anything besides producing the internet's finest sh*tposts?
In this light-hearted talk, I’ll break down real ways to measure AI’s impact—beyond the memes. We’ll look at metrics for individual contributors, teams, and departments, exploring whether AI is a true game-changer or just another shiny buzzword.
YouTube Search is about understanding intent across billions of queries while managing complex metadata at scale and delivering real-time analytics.
This session bridges my experience as a broadcast engineer at YouTube Space LA to the developer and open source community. We'll explore practical lessons from YouTube's search infrastructure and show how to maximize these challenges - from ambiguous queries to recommendation systems.
You'll learn:
1) Observability at Scale: What YouTube’s metadata means for your AIOps and observability stack
2) Platform Engineering for Search: Building developer-friendly search infrastructure that your teams will actually want to use
3) Real-time Analytics: Building pipelines that power recommendation engines that power both dashboards and ML-driven insights
As developer advocates, we know that effective discovery isn't just about finding content - it's about connecting developers with the knowledge they need to grow so that the next generation can build on what we've learned.
Focus on intent: what people are trying to do with the system. While product analytics might give a broad sense of “what happened,” making sense of telemetry pointing to “what went wrong” is key to improving the system. For users, the specific issue doesn’t matter, because the software doesn’t work!
AI agents are transforming the way we manage cloud infrastructure — bringing automation, context awareness, and natural language control to everyday DevOps and SRE tasks.
In this hands-on workshop, we’ll build an intelligent AI agent to interact with AWS services.
Attendees will learn how to use Agent Framework Strands using Python, fundamentals of MCP, and leverage AWS service AgentCore to improve authentication, scaling and observability.
Requirement:
Familiarity with Python and basic AWS services. AWS account with Bedrock access or any LLM API key.
Target Audience: Cloud engineers, AI/ML practitioners, early-career tech professionals.
Back in 1976, the Makefile made it easy to compile C programs. In 2025, it’s used for automating just about everything, from shell scripts to build workflows to CI/CD pipelines.
Over 49 years, a lot has obviously changed about the way we design and build software. However, the fundamentals (esp. design patterns, algorithms, and architectures) have stayed more or less the same. We can learn a lot from the Makefile, from its design to how it has remained relevant in such a quickly evolving space. In this ignite, Benjie will talk about what patterns made the Makefile a mainstay in software development and deployment. He’ll also cover why good software is timeless.
Scaling GitOps for large-scale deployments can be challenging with a single repository or controller. This talk explores sharding as a strategy to optimize performance, improve reliability, and manage complexity in GitOps workflows for multi-environment or multi-tenant setups.