My AI Agent Bypassed Staging 28 Times. So I Built a Pipeline.

I've been letting Claude Code write and push code to my repos for months. It's genuinely productive. But over a few weeks, my AI agent skipped staging and pushed straight to production 28 times. Here's the full pipeline I built to make that impossible.

I've been letting Claude Code write and push code to my repos for a few months now. It's productive. Genuinely productive. It writes clean code, solves problems fast, and moves on to the next task without hesitation.That last part is the problem.Over the course of a few weeks, my AI agent skipped staging and pushed straight to production 28 times. I didn't notice for the first 6. The code was correct every time. The deploy path was not.Production broke on a Saturday because the agent decided staging was unnecessary. A database migration that worked in development didn't work against production data. A config variable that existed locally didn't exist in the production environment. Classic deployment bugs, the kind that staging exists to catch.The agent didn't care. It had already moved on to its next task.What I tried firstThe obvious move was to tell the agent to follow the process. I updated the project rules. I filled .md files with instructions. "Always push to a feature branch." "Never merge to main without tests." "Wait for staging verification before promoting."It worked for a while. Then the agent optimized the rules away. Not maliciously. It just decided the fastest path to completing its task was skipping the parts that looked like friction. Staging looked like friction.I tried making the instructions more explicit. More detailed. More files. More rules. Same result. I'd find PRs merged directly to main with commit messages like "fix: update config" as if that explained anything.The realization was simple: you can't solve an infrastructure problem with instructions. If the mistake is possible, the agent will eventually make it. You have to make the mistake impossible.The architectureHere's the full pipeline flow:AI Agent pushes feature branch | v PR to staging branch | v GitHub Actions runs CI tests | Pass? ──No──> PR blocked, agent can't merge | Yes | v Auto-merge to staging → Deploy to staging environment | v Health check: verify commit hash on staging | Match? ──No──> Pipeline stops, alert sent | Yes | v CueAPI cue created: "test-staging" | v AI test agent claims the cue → Integration tests run against live staging | Pass? ──No──> GitHub issue + email + blocked | Yes | v Agent opens PR: staging → main → Auto-approve + auto-merge | v Deploy to production → CueAPI cue: "verify-production" | v Agent verifies production health | Healthy? ──No──> GitHub issue + email | Yes → DoneEvery arrow is enforced by infrastructure. The agent can't skip steps because the next step doesn't trigger until the previous one completes.The stackGitHub is the foundation. Branch protection rules on main are what make the whole thing work. Without branch protection, the agent can push to main regardless of what your workflows say. With branch protection, the only path to main is through a PR that passes checks.GitHub Actions runs four workflows that chain together. Each triggers on a specific event and hands off to the next.Railway (or whatever hosting platform you use) handles deployments. The pipeline is hosting-agnostic — Vercel, Fly.io, Render, anything that deploys from a branch and exposes a health endpoint works.CueAPI is the handoff layer between CI workflows and the AI test agent. When staging deploys successfully, the workflow creates a cue. The agent picks it up, runs integration tests, and reports results. Without this layer, you're trusting the agent to notice the deploy finished. It won't. It's already doing something else.CueAPI is the scheduling and execution accountability API I built. Full disclosure: I'm the author. Open source, self-hostable, or use the hosted version at cueapi.ai. The free tier covers everything this pipeline needs.Resend handles failure notification emails. Python runs the agent scripts using the requests library.The four workflows1. feature-to-staging.ymlTriggers when a PR opens against the staging branch. Runs CI tests and auto-merges if they pass. Uses a DEPLOY_TOKEN PAT instead of the default GITHUB_TOKEN — GitHub Actions won't trigger downstream workflows with the default token. If tests fail, the PR stays open and blocked.2. staging-deploy.ymlTriggers when code lands on staging. Deploys, then runs a commit hash health check — not just "is the server responding." I added this because staging was returning 200 OK while running old code. The agent saw the 200, ran tests against stale code, and promoted it to production. After confirming the right code is live, creates a CueAPI cue: test-staging.3. auto-approve-merge.ymlTriggers when a PR opens against main. Checks if the PR author is the bot account. If yes, auto-approves and auto-merges. Non-bot PRs still go through normal review.4. production-verify.ymlTriggers when code lands on main. Creates a CueAPI cue: verify-production. The AI agent picks it up and runs health checks across all production URLs.The agent scriptsconfig.py — Staging URL, production URL, repo name, bot username, notification email. Update once at setup....