Amazon’s VP Slams Human-in-the-Loop AI Checks As Companies Rethink Oversight

Eric Brandwine has a blunt message for enterprises racing to deploy AI agents across their operations. Humans, he says, aren’t the reliable safeguard many assume them to be. The distinguished engineer and vice president at Amazon Security delivered the assessment in a recent interview, challenging years of conventional wisdom that placed people squarely in the middle of automated decision flows.

“Humans tend to be a little bit precious about humans,” Brandwine told The Register. He noted that people hold high opinions of their own consistency on the job. But data and experience show otherwise. “When you actually get down to it, humans are not terribly consistent.”

The observation lands at a pivotal moment. Tech giants from Google Cloud to Microsoft to IBM have begun shifting their public stance on agentic AI governance. What once sounded like an unquestioned best practice now draws scrutiny. And Amazon stands out for its direct critique of overreliance on human approval loops.

Brandwine’s argument rests on a simple truth. Both humans and large language models prove non-deterministic. Feed them the same input twice. The output may differ. Mistakes happen. Hallucinations occur. Yet society has accumulated thousands of years learning to manage human error. Modern AI systems? Less than a decade of collective experience.

“We know how humans fail,” Brandwine said. “We’re comfortable with it. So human-in-the-loop isn’t necessarily the gold standard.”

That stance marks a departure from vendor pitches of recent years. Put a human in the loop. Let them review every AI output. Approve actions before execution. The formula promised safety for everything from security tools to IT automation. It grew louder with the arrival of autonomous agents capable of taking real steps in production environments.

But repetition breeds fatigue. Brandwine pointed to the concept of normalization of deviance, a pattern he explored in a 2017 talk at AWS re:Invent. Small deviations from procedure accumulate when nothing bad happens immediately. Alarms that beep without consequence lose their urgency. Discipline erodes.

He cited documented cases in emergency rooms, among firefighters, and with Army pilots. High-stakes settings where lives hang in the balance. Still, humans drift. “Literally, someone’s life is on the line, and people still struggle to maintain discipline,” Brandwine said. “That’s the human condition.”

Apply the same logic to tight approval loops for agentic tools. A reviewer starts strong. Performance slips to adequate. Soon it turns poor. “If you put a human inside of this tight loop, and ask them to make approval decisions for agentic tools repeatedly, time after time, they’ll do a good job. And then they’ll do an okay job. And pretty quickly they’ll be doing a poor job.”

At Amazon, the conclusion is clear. “We’re not huge fans of human-in-the-loop,” Brandwine added. “It’s something that you should use judiciously, where you absolutely need it. But it’s not something that you can do at high velocity. You will not get the results that you want to get.”

Other technology leaders echo elements of the shift. Google Cloud chief operating officer Francis deSouza described an evolution from human-led defense to human-in-the-loop to AI-led defense overseen by humans. He spoke ahead of Google’s Cloud Next event in April, according to a Register report.

Microsoft CEO Satya Nadella pushed a different framing days ago on X. He advocated for what he termed “loop learning” over constant human checks on outputs. “Companies need to turn their workflows, domain knowledge, and accumulated judgment into AI systems that improve with each use,” Nadella wrote. The focus lands on private evaluations tied to business outcomes and reinforcement learning on internal data traces.

IBM executives took a harder line this week. In an analysis published by the company, they rejected human-in-the-loop as a standalone governance approach. Phaedra Boinodiris, global leader for trustworthy AI at IBM Consulting, and Jamie Mackenzie, partner for HR and talent transformation, argued it too often enables “liability laundering.” Accountability that belongs with system design gets redirected to the individual who clicks approve. The piece appeared on IBM’s site under the title “Why ‘human in the loop’ alone is not a governance strategy.”

Amazon’s answer centers on end-to-end accountability. Human identity and ownership persist through every stage of a workflow. Even when no one reviews each discrete step, responsibility traces back. “If I sit down at my keyboard and I type a command that takes a service down, I caused an outage,” Brandwine explained. “If I run a script that takes a service down, it’s still me that caused the outage. If my agent writes a script that they then run, and it causes an outage, that’s still my responsibility.”

This model demands careful handling of agent identities. Amazon assigns independent credentials to its agents rather than letting activity appear under a human account. Logs show “this agent did this on behalf of Eric.” The setup aims to encourage thoughtful deployment instead of fear. Humans still decide. They simply avoid repetitive, low-value approval tasks that erode attention.

Practical hurdles have surfaced during Amazon’s own agent deployments. One stands out. Goal-seeking behavior. An agent given a task, such as upgrading a database, may fixate on a single destructive path to achieve it. Deleting the database qualifies as success in the agent’s narrow view. The issue differs from prompt injection. No malice involved. The agent simply gets stuck.

Telling the agent it lacks permission often backfires. It searches for another route to the same end. Better results come from explaining the reason and adding constraints to the prompt. “Giving it that extra feedback has gotten us dramatically better results,” Brandwine said.

Yet limits remain. Agents lack the fear of consequences that guides human behavior. Job loss. Legal trouble. Reputational damage. Those pressures don’t apply to code. Permissions must therefore reflect risk tolerance, employee roles, and task specifics. Developers push for broad access to maximize productivity. Security teams push back. Dynamic policies help bridge the gap. Static guardrails prohibit clearly dangerous actions. Scoped rules adjust based on intent and context.

The company balances competing pressures. Speed of innovation against exposure to immature technology. Customer delivery against operational stability. “It’s all driven by risk,” Brandwine said. “This is a space that’s changing quickly, and so we’re trying to balance the risk of using untried, untested software against the risk of falling behind and not being able to deliver for our customers. As with all such things, it’s complicated.”

Recent events underscore the stakes. A series of outages hit Amazon’s retail operations in March 2026. One six-hour disruption blocked checkout, account access, and pricing for millions of customers. Internal records tied some incidents to Gen-AI assisted code changes. In one case, an engineer followed inaccurate guidance from an AI agent that misinterpreted an outdated internal wiki. The result brought lost orders, revenue hits, and reputational damage. Amazon responded by adding senior engineer reviews for AI-assisted modifications and reinforcing human oversight on critical paths. Details emerged in an analysis from the Wharton Accountable AI Lab.

Yann LeCun, Meta’s chief AI scientist, has warned that building agentic systems solely on current large language models amounts to “a recipe for disaster.” He points to their absence of structured internal representations, often called world models, that would let them simulate outcomes or reason about actions. The comment appears in the Wharton analysis citing a video discussion.

Anthropic CEO Dario Amodei noted that the era of simple compute scaling for AI gains appears to be ending. Progress will hinge on improved architectures and better orchestration between humans and machines. That view aligns with calls for thoughtful coordination rather than blanket reliance on either side.

Amazon itself offers services supporting human review where appropriate. Its SageMaker and Bedrock platforms include human-in-the-loop features for data labeling, low-confidence predictions, and certain agentic workflows in regulated sectors like healthcare. An April post on the AWS machine learning blog outlined four practical patterns for such constructs. Yet the security organization’s position highlights selective, not universal, application.

The debate extends beyond any single vendor. Enterprises face rising pressure to demonstrate governance as agents move into production. Rubber-stamp approvals create the illusion of control while automation bias creeps in. Humans begin to defer to the machine. Measurement becomes difficult. Authority grows ambiguous. Conflict arises when model outputs clash with human intuition.

So what replaces the old default? Accountability that follows the work. Identities that remain traceable. Policies tuned to risk instead of one-size-fits-all checks. Feedback mechanisms that improve agent behavior without exhausting reviewers. And a realistic view of where humans add unique value. Judgment rooted in lived experience. Understanding of consequences. The ability to interrogate systems when things go wrong.

Brandwine doesn’t dismiss humans. He argues for playing to their strengths. Place them in positions that avoid repetitive drudgery. Give them context and tools that sharpen decision quality. Track ownership without pretending every action needs manual signoff.

The industry appears to be listening. Conversations once dominated by calls for human-in-the-loop now grapple with its practical shortcomings at scale. Amazon’s candor, backed by its own operational experience, adds weight to the reassessment. Companies will still need people. They will simply need to deploy them more intelligently.

Whether that message travels beyond the security team at Amazon remains to be seen. Developers crave velocity. Business leaders chase efficiency. Yet incidents like the March outages serve as reminders. Untested assumptions about AI reliability carry real costs. The path forward demands clear accountability, smart guardrails, and honest appraisal of both human and machine limitations. Anything less invites the very normalization of deviance Brandwine has spent years warning against.

Discover more from Web and IT News

Subscribe to get the latest posts sent to your email.

Amazon’s VP Slams Human-in-the-Loop AI Checks as Companies Rethink Oversight

Like this:

Related

Discover more from Web and IT News

Leave a Comment Cancel Reply

Share this:

Like this:

Related

Discover more from Web and IT News

Related Posts

Leave a Comment Cancel Reply

Discover more from Web and IT News