When AI Ships the Bug: Who Pays?
The test suite said ship it. The AI said it was safe. The bug cost $2 million. Now what?
In April 2012, Knight Capital Group lost $440 million in 45 minutes due to a software deployment error. The bug? A retired flag that wasn’t properly removed from production code. Standard testing should have caught it.
That was before AI testing tools became standard practice.
Now imagine the same scenario in 2025, except the tests were AI-generated. The AI said the code was safe. Every test passed. Your team shipped it based on the AI’s recommendation.
When the losses pile up and the lawsuits start, who’s responsible? The engineer who wrote the code? The QA lead who trusted the AI? The company that mandated AI adoption to ship faster?
Nobody has a good answer. But courts are starting to provide one — and it’s not the answer AI vendors want to hear.
The Accountability Gap
Here’s the problem in plain terms: traditional software liability is built on human decisions. An engineer writes code. A tester reviews it. A manager approves deployment. When something breaks, you can trace the chain of decisions back to people who made choices.
But what happens when AI is making those choices?
In July 2024, the Mobley v. Workday case fundamentally changed AI liability. Judge Rita Lin ruled that Workday’s automated screening tools made the company an “agent” of the employers using it, holding the AI vendor directly liable for discriminatory hiring decisions. This was the first time a federal court applied agency theory to an AI vendor.
The case achieved nationwide class action certification in May 2025. The legal reasoning: when AI systems perform functions traditionally handled by employees — like screening job applicants or, theoretically, approving code for deployment — the vendor isn’t just providing software. It’s acting as an agent making decisions.
If this logic extends to testing tools, AI vendors might be liable when their tools approve broken code. But their contracts say otherwise.
When AI Misses the Bug That Matters
The liability question gets messier when AI doesn’t approve bad code — it just misses the bug entirely.
An e-commerce platform used AI to generate tests for a new checkout flow. The AI created 600 tests. All passed. They shipped.
The bug? A tax calculation error that only appeared for orders over $10,000 shipped to five specific states. The AI never tested that scenario because it wasn’t common in the training data.
Cost to the company: $400,000 in incorrect tax filings before they caught it. Plus penalties. Plus an IRS audit.
Their legal department wanted to sue the AI vendor. The vendor’s terms of service said: “Our tool assists with testing. Final responsibility for code quality remains with the user.”
Translation: you’re on your own.
The QA lead got fired. The team was told to “be more careful” with AI tools. Nobody learned anything except that getting blamed sucks.
The Insurance Problem
Some companies are trying to solve this with insurance. Cyber liability policies, errors and omissions coverage, product liability insurance.
But insurance companies are scrambling to figure out how to price risk when AI is involved.
A broker I spoke with said: “How do I underwrite a policy when the policyholder says ‘our AI does 70% of our code review’? What’s the failure rate? What’s the blast radius when it’s wrong? Nobody has actuarial tables for this yet.”
Early policies are either prohibitively expensive or have massive carve-outs that exclude AI-related failures. One company’s policy explicitly stated: “Claims arising from automated testing or AI-assisted quality assurance are not covered.”
Great. So you’re incentivized not to use AI, or to lie about using it.
The Vendor Shield
AI tool vendors have gotten smart about liability. Their contracts are airtight.
Standard language includes:
“Tool is provided as-is with no warranty”
“User is solely responsible for validation of outputs”
“Vendor is not liable for consequential damages”
“Maximum liability is limited to fees paid in the last 12 months”
One contract I reviewed had a liability cap of $50,000. For a tool being used to test software that processes millions of dollars in transactions daily.
When I asked the vendor about it, their response was: “We provide a tool. How customers use it is up to them. We can’t be responsible for their deployment decisions.”
Legally, they’re probably right. Practically, it means companies are taking on massive risk with zero recourse when things go wrong.
When There’s No Vendor to Blame
But what about teams using ChatGPT, Claude, or Copilot directly? No specialized testing vendor. Just engineers asking LLMs to write tests or review code.
This is messier. Because now there’s no contract to point to. No vendor to deflect blame toward. Just an engineer who asked an AI a question and trusted the answer.
A backend engineer at a logistics company told me what happened on their team: “Our tech lead asked GPT-4 to generate integration tests for a shipping cost calculator. GPT wrote 30 tests. They all passed. We shipped. The calculator was charging customers 10x the correct rate for international shipping because GPT’s test data only used domestic examples.”
Who’s responsible? OpenAI’s terms of service say their models aren’t suitable for high-stakes decisions without human review. So legally, it’s on the engineer.
But here’s the twist: the company had no policy about using LLMs for testing. No guidelines. No review process. Management knew engineers were using AI tools — they just didn’t formalize anything.
When the bug hit, they fired the engineer for “poor judgment.” The engineer’s argument: “Everyone on the team uses ChatGPT for tests. Why am I the only one getting punished?”
The answer: because your bug made it to production.
This is the in-house AI problem. When you’re using general-purpose AI tools instead of specialized testing vendors:
There’s no one to sue — OpenAI’s liability cap is basically zero for free/standard API usage. You used their tool. Your problem now.
There’s no audit trail — Unlike dedicated QA tools, your ChatGPT conversation isn’t logged in your company’s systems. Can’t prove what the AI suggested or what you changed.
There’s no shared responsibility — With a vendor tool, multiple people configure it, approve it, oversee it. With an LLM, it’s often one person asking questions in a chat window.
There’s no policy framework — Most companies have procurement processes for tools. But if an engineer is just using ChatGPT Plus on their personal account? That’s not IT-managed. It’s shadow AI.
The liability lands entirely on the individual. And companies are fine with this because they’re getting AI productivity gains without the vendor contracts or compliance overhead.
The Shadow AI Liability Trap
Here’s what’s happening in 2025: engineers and QA folks are using AI everywhere. Asking Claude to review pull requests. Using ChatGPT to generate test cases. Having Copilot autocomplete entire test suites.
Management tacitly encourages this. “Work smarter,” they say. “Use the tools available.”
But when something breaks, suddenly it’s: “You should have known better than to trust AI without verification.”
One QA lead described it as “plausible deniability management” — leadership benefits from AI productivity but maintains they never officially authorized its use, so they’re not liable when it fails.
A contractor told me: “I used Claude to help write tests for a critical payment flow. Saved me two days of work. One test had a logical error that let through invalid transactions. Cost the client $200k. They terminated my contract and refused to pay my last invoice. Their position: I used unauthorized AI tools. My position: everyone on the team does this, and you knew it. Didn’t matter. No contract, no recourse.”
The wild part? If he’d used an official AI testing vendor that his company had purchased, there would have been shared accountability. Process. Documentation.
But because he used a general LLM directly? He was the single point of failure.
Real Cases, Real Consequences
Case 1: The Trading Platform An AI approved a change to a high-frequency trading algorithm. The change had a race condition that caused the system to execute duplicate trades. Loss: $8 million in 15 minutes.
Outcome: The engineering manager was fired. The company tried to sue the AI vendor and lost. The contract was clear. No recovery.
Case 2: The Medical Device An AI-generated test suite missed a boundary condition in a diabetes monitoring app. The app gave incorrect insulin dosing recommendations.
Outcome: Class action lawsuit. FDA investigation. The company settled for $12 million. The individual developers were named in the suit. Their E&O insurance didn’t cover AI-assisted development.
Case 3: The SaaS Platform An AI code review tool approved a change that inadvertently disabled audit logging for 72 hours. During that window, a security breach occurred. No logs to investigate.
Outcome: Customer sued for breach of contract. Company tried to claim force majeure (the AI failure was unforeseeable). Judge didn’t buy it. Company paid $3 million in damages and lost the customer.
The Developer’s Dilemma
If you’re an engineer or QA lead using AI tools, you’re in an impossible position.
Use AI: ship faster, hit deadlines, make management happy. But if AI screws up, you’re the one who gets blamed.
Don’t use AI: fall behind competitors, miss velocity targets, look like a dinosaur. And you still get blamed when manual testing misses bugs.
One senior SDET told me: “I used to sign off on releases with confidence. Now I sign off knowing that half my test coverage came from an AI I don’t fully trust, and if something breaks, I’m the scapegoat. It’s exhausting.”
Another engineer described it as “liability laundering” — management pushes AI adoption to hit metrics, but when it fails, they hold individuals accountable as if they’d made every decision manually.
What the Law Actually Says (Spoiler: Not Much)
There’s almost no legal precedent for AI liability in software testing. A few relevant principles:
Negligence: If you use a tool recklessly or ignore obvious red flags, you’re liable. But what counts as “reckless” when the industry standard is to use AI?
Product Liability: If the AI tool itself is defective, the vendor might be liable. But good luck proving the AI was “defective” vs. just “wrong this time.”
Contract Law: Your employment contract probably says you’re responsible for code quality. AI being involved doesn’t change that.
Professional Standards: There are no professional standards yet for AI-assisted development. So courts fall back on “what would a reasonable engineer do?” — which is circular when everyone’s figuring this out in real time.
One lawyer specializing in tech litigation told me: “We’re going to see a wave of cases in the next 2–3 years that establish precedent. Right now it’s the Wild West. Judges don’t understand the technology. Juries don’t understand the workflow. Everyone’s guessing.”
The Paper Trail Problem
Here’s where it gets really uncomfortable: when you use AI for testing or code review, the decision trail evaporates.
Traditional process leaves artifacts:
“Jane reviewed this PR and approved it”
“The test plan covered scenarios A, B, and C”
“We decided not to test scenario D because of X, Y, Z”
AI process looks like:
“The AI generated tests”
“The tests passed”
“We shipped”
When the lawsuit comes, there’s no documentation of human judgment. No record of risk assessment. No trail of decisions that shows you were acting reasonably.
One company’s lawyers advised them to start keeping logs of every AI-generated test, including which ones they accepted, rejected, or modified — and why. The overhead nearly eliminated the productivity gains from using AI in the first place.
Who Should Actually Pay?
Let’s think through the scenarios:
Scenario 1: AI misses a bug due to training data gaps The vendor built a tool based on common patterns. Your bug was uncommon. Who’s at fault? Arguably you, for not validating the AI’s work thoroughly enough.
Scenario 2: AI generates a test that passes but validates the wrong thing The AI wrote a test that checks if a function runs, not if it returns correct results. Your reviewer missed it. Who’s at fault? Probably your reviewer, for trusting AI output blindly.
Scenario 3: AI approves code with a known vulnerability pattern The vulnerability is documented. The AI should have caught it. Who’s at fault? Maybe the vendor — but their contract says otherwise.
Scenario 4: AI is right, but your production environment is different from your test environment The code works in testing. It fails in production due to infrastructure differences the AI couldn’t know about. Who’s at fault? You, for environmental inconsistency.
See the pattern? Almost every scenario traces back to the user, not the vendor.
What Companies Are Actually Doing
The smart ones are adapting:
1. Dual Sign-Off: AI approves, then a human senior engineer must also approve. Both are on the hook.
2. AI Audit Logs: Every AI decision gets logged with the reasoning. When bugs slip through, they can trace what the AI saw and why it missed the issue.
3. Hybrid Liability Insurance: Some insurers are offering experimental policies that cover “AI-augmented development” but require specific safety practices.
4. Kill Switches: AI can suggest and generate, but only humans can promote to production. Makes the process slower but clearer for liability.
5. Vendor Negotiations: Larger companies are demanding better liability terms. Some vendors are offering indemnification for enterprise customers — at 3–5x the price.
The small companies and indie developers? They’re just hoping nothing breaks.
The Individual Developer’s Defense
If you’re using AI tools and worried about personal liability:
Document everything. Keep notes on which AI suggestions you accepted, rejected, or modified — and why.
Challenge the AI. If an AI says code is safe but you’re not sure, escalate. Have a human confirm. Get it in writing.
Know your contracts. Does your employment agreement make you personally liable for code defects? Do you have E&O insurance? Should you?
Build your case preemptively. If you’re following industry best practices, documenting decisions, and acting reasonably, you have a defense. But only if you can prove it.
Get it in writing from management. If your boss pushes you to ship AI-approved code without human review, email them expressing concerns. When the lawsuit comes, that email matters.
Treat every AI interaction like it’ll be read aloud in court someday. It’s paranoid. It’s also probably smart.
The Uncomfortable Truth
Here’s what nobody wants to say out loud: we’re using AI to diffuse responsibility while concentrating liability.
When a team of humans reviews code, responsibility is shared. When AI does most of the work and one human signs off, that one human is holding all the risk.
Companies love this because it lets them ship faster while keeping their legal exposure contained to individual employees who are easy to fire.
Vendors love this because they can sell tools without liability.
The only people who don’t love it? The engineers and QA folks who are realizing they’re the designated scapegoats in a system that’s designed to fail safely — for everyone except them.
So when AI approves broken code, who’s responsible?
Legally, probably you.
Fairly? That’s a different question.
One we’re going to spend the next decade fighting about.









