AI Operations (AIOPS)

Diagnosing why AI initiatives stall —
and how leaders can move decisions forward


This is an ongoing series examining how organisations move AI from pilot to production. Each section focuses on a specific decision gate where work gets stuck, the failure patterns behind it, and the artefacts that unblock progress.


Diagnostic Series

A weekly series diagnosing why AI initiatives stall — and how leaders unblock decisions by clearly bounding risk.


The Washington Post’s AI Podcast Moment,

and the avoided decision that sinks most AI projects

The Washington Post launched a new feature in its mobile app called Your Personal Podcast. It was pitched as a personalised audio briefing: two AI hosts, your preferred topics, your reading history, and a tidy little “catch me up” experience.Then reality hit.Journalists inside the Post (and people watching from the outside) started flagging issues that aren’t “minor errors” in a newsroom context: misattributed quotes, invented quotes, and commentary creeping into what should be straightforward reporting.Reporting suggested the Post had internal testing that found a large share of scripts failing its own standards, but it launched anyway, aiming to 'iterate through the remaining issues'.This is not a story of AI models getting it wrong. It’s a story of how a use-case is mis-operationalised.This is not an AI problem
I am not suggesting that AI models are foolproof or model owners do not have responsibility. Rather, if you decide to use it, there are things you can do to ‘bound the risk’.
An AI pilot can seem like magic. The production environment is way less forgiving. This is the essence of the pilot → production trap.In the case of The Post, the question isn’t “why did the AI make mistakes?”. The question is:Why did a major institution put an AI voice in front of the public, wearing the Post’s credibility, when the output behaved like an intern with confidence and no supervision.The decision gate
A newsroom is basically a trust factory. So if the trust standard isn’t explicit, you don’t have a product problem; you have a credibility incident waiting to happen. In this case, it looks like the trust standard wasn’t just unclear, it was bypassed.
Human Validation & Trust is one of the many “decision gates” in operationalising AI (more on decision gates another time). In plain terms, this gate is where teams get stuck (or should get stuck) on one hard question:“What do we trust this thing to do?”The failure pattern
Here’s the failure pattern I see in the facts reported:

  1. “Beta” was treated as a safety blanket: calling something a beta can be fine in product development. But in journalism, “beta” doesn’t reduce the reputational radius. The output still sounds like the Post.

  2. The system was allowed to make the wrong type of mistake: Invented quotes and misattributions are not “quality issues.” They are truth violations.

  3. It crossed the “institutional voice” boundary without an operating contract: Once an AI host is narrating your reporting, it’s no longer a back-office productivity tool. It’s performing the organisation in public.

The Key Lesson: Why AI Fails at the Decision Gate
A pilot is permission to learn. Production is permission to operate.
My work focuses on one recurring problem I see across organisations: AI initiatives rarely fail because the models are weak (tools are what they are at any point in human history), but because the deterministic decisions required for production are delayed, blurred, or avoided altogether. Over time, I’ve learned that the fastest way to unblock these initiatives is not better prompts or more pilots, but surfacing the specific decision that is stuck, diagnosing why it’s blocked, and converting judgement into a signable operating artefact.The Washington Post story is a public version of what happens inside enterprises every day:

  • Who owns the output?

  • What happens when it’s wrong?

  • Who has stop-the-line authority?

The best-practice move of the 5%
McKinsey’s 2025 State of AI work makes a useful point about high performers in AI deployment: they are more likely to have defined processes for how and when model outputs need human validation.
Human-in-the-loop can’t mean “a Washington Post editor listens to every personalised episode.” This product is explicitly designed as self-serve, “audience of one" experience: users pick topics, hosts, length, and the system stitches stories based on reading/listening history.So what do the decisions of the 5% look like when “review everything” is impossible? Three very ordinary, very human decisions, bound the risk for them:

  • What is this AI allowed to do, and what is it not allowed to do? (The “job boundary.”)

  • What level of exposure are we comfortable with while we learn? (The “blast-radius boundary.”)

  • What is our validation standard when things go wrong? (The “supervision boundary.”)

The diagnostic artefacts
This is where pilots quietly deceive teams. A pilot can feel “successful” because the room is friendly, the scope is narrow, and everyone treats mistakes as learning.
Production is different. Production needs artefacts, because artefacts are how you bound risk and make the supervision decision legible.If I diagnose this through the Human Validation & Trust gate, the “missing artefacts” are the ones that force the organisation to answer, in writing: “What is our trust standard for this class of output?”. The key artefact is the Human Validation Standard which asks:

  • What can the AI do without review?

  • What is the “acceptable accuracy” for each content type?

  • What disables publishing immediately?

These artefacts don’t “solve” the problem. They make the real decision unavoidable before customers and journalists force it for you.Operationalisation – Move work forward
The 5% treat trust like an operating contract, something that can be signed. The diagnosis is turned into a contract: The Brief.
The Brief is how you convert a probabilistic pilot into a deterministic business decision.It isn’t a bigger model, or a better prompt. The Brief locks in the decisions you’re otherwise avoiding:That is the institutionalisation step: If you can sign that brief, you can operate. If you can’t sign it, you’ve learned something essential: you’re not ready for production, not because AI is “bad,” but because the organisation hasn’t made the deterministic decisions that production demands.And that’s the core lesson the Washington Post story makes visible.

  • What the AI is for (and what it is not for).

  • What autonomy class it’s in (assist vs speak/represent).

  • What the validation standard is when you cannot review every output.

  • Who owns the standard day-to-day.

  • What evidence you will track (so trust doesn’t become opinion).

  • What triggers a rollback or pause.

That is the institutionalisation step: If you can sign that brief, you can operate. If you can’t sign it, you’ve learned something essential: you’re not ready for production, not because AI is “bad,” but because the organisation hasn’t made the deterministic decisions that production demands.And that’s the core lesson the Washington Post story makes visible.

About the author

*Vinesh Prasad works with leaders and boards to unblock stalled AI initiatives by turning judgement into signable approval artefacts. He is an entrepreneur, teaches strategy and consulting at a university business school and advises organisations navigating the shift from AI pilots to production.
This article is part of an ongoing series on operationalising AI beyond pilots. If this problem sounds familiar, you can fill out the contact form or DM him on LinkedIn.

Contact | Join

Get essays, public diagnostics, and invitations to small, focused sessions on operationalising AI straight to your mailbox.


Mattis aliquam

Lorem ipsum dolor sit amet, etiam lorem sed et adipiscing elit cras turpis veroeros.