Concepts6 min read

AI agent governance vs guardrails

Guardrails ask if content is dangerous. Governance asks if the actor is authorized. An agent can pass every check and still move money it should never touch.

A guardrail and a governance control look like the same thing in a slide deck. Both promise to stop an AI agent from doing harm. But they answer two different questions, and a regulated team that buys one believing it has the other is exposed in a way no demo will reveal.

A guardrail asks: is this content dangerous? It inspects what the model reads and writes — prompts, completions, tool inputs — and blocks the toxic, the leaked, the jailbroken, the off-policy. A governance control asks a question the content never answers: is this actor authorized? Not "does this output look safe," but "is this agent, in this role, permitted to take this action at all — and who said so."

Those are not two flavors of the same product. They sit on different axes. And the gap between them is exactly where a regulated agent gets you in trouble.

The clean output that should never have been sent

Picture an agent in a bank's treasury operations. It reconciles cash positions overnight and proposes settlement instructions. A guardrail layer — Galileo, Cisco's agent controls, any of the strong tools in that category — watches its inputs and outputs. No prompt injection. No leaked account numbers. No abusive language. Every check passes.

The agent then issues a payment to an external account.

The instruction was clean. Well-formed. Perfectly polite. It contained nothing a content filter is built to catch — because the problem was never the content. The problem was authority. This agent was scoped to reconcile and propose. It was never authorized to release funds. No guardrail asks that question, because a guardrail reads the message, not the mandate.

This is the failure mode that matters: not the agent that says something offensive, but the agent that does something correct-looking and entirely outside its remit. The output passes every test and is still a control breach.

Two axes, not two tiers

It helps to stop thinking of these as "basic" versus "advanced" safety. They guard different things.

Guardrails Governance
Question Is this content dangerous? Is this actor authorized?
Watches Prompts, outputs, tool inputs Identity, role, action, approval
Catches Toxicity, leaks, injection, off-policy text Self-approval, out-of-scope actions, missing sign-off
Evidence it leaves Flagged content A signed, replayable record of who was allowed to do what
Regulator's question "Did it say something harmful?" "Prove what it was allowed to do."

A guardrail can be flawless and a governance failure can still occur, because the agent was never asking a dangerous question — it was taking an unauthorized action. The reverse is also true: a perfectly authorized agent can still be fed a poisoned prompt that a guardrail catches. Each covers the other's blind spot. They are complementary, and a serious deployment runs both.

Why "is the actor authorized?" is the harder problem

Content is in front of you. You can read it, score it, filter it. Authority is not in the content at all — it is a fact about the world that has to be established, enforced, and recorded outside the model.

To answer "is this actor authorized," you need machinery a content filter does not have:

  • Identity. The agent has to be a named principal holding exactly one role, so the action can be attributed to someone, not to an anonymous process.
  • A grant. The role has to have been explicitly given the capability to take this action — and everything not granted has to be denied by default, not permitted by silence. We make the case for that posture in deny-by-default permissions.
  • Segregation of duties. The agent that prepared a piece of work cannot be the one that approves it — not as a guideline, but enforced inside the run so it is structurally impossible.
  • An approval gate. For one-way doors — releasing funds, filing a report, pushing to production — the action parks and waits for a named human signature, and the requester is barred from signing their own request.
  • A record. Every action, grant, and refusal lands in a tamper-evident ledger that a third party can verify, so "it was authorized" is a provable claim and not a recollection.

None of that lives in the text stream. A guardrail reads the message; governance governs the actor. This is the work a control plane does, and the reason it has to sit beside the agent rather than inside its prompt — the boundary has to hold even when the model is swapped, re-prompted, or jailbroken.

What the rules actually demand

The distinction is not academic. The control standards regulated industries already enforce are governance standards, not content standards.

The Wolfsberg Group names maker-checker — the four-eye principle — as the control for sensitive financial actions. That is a statement about authority: the person who initiates cannot be the person who approves. In pharma, 21 CFR §211.22 makes the quality unit's segregation of duties a structural requirement. Filing a Suspicious Activity Report under the Bank Secrecy Act is a mandated human decision, and NYDFS Part 504 requires an institution to certify its transaction-monitoring program annually. None of these ask whether a message was toxic. All of them ask who was allowed to act, and whether a person signed.

A guardrail cannot satisfy any of them, because they are not questions about content. They are questions about who held the authority and whether the record proves it.

How they compose

The right mental model is layers, not rivals. Run a guardrail to keep dangerous content out of the model's inputs and outputs. Run governance to keep unauthorized actions out of the world. A poisoned prompt is the guardrail's job. A self-approved batch release is governance's job. Neither covers the other, and shipping with only one is shipping with a known gap.

When teams move agents out of pilot and into production, this is the layer that is usually missing — the pilot had guardrails, and the production deployment needed authority controls it never built. We walk through that transition in from pilot to production.

So when a vendor tells you their tool makes your agents "safe," ask which question it answers. If it reads prompts and outputs, it is a guardrail — a useful one, and you should keep it. If it can prove that an agent was barred from approving its own work, and produce a record an examiner can verify offline, it is governance. You need both, and they are not the same purchase.


See how it works, or book a demo to watch an agent pass every content check and still get blocked from approving its own work — live.

Where this goes to work

How MakerChecker works — the six primitives

Agents as employees, versioned grants, structural segregation of duties, approval gates, role limits, and a signed audit a regulator verifies offline.

See it for yourself

See an agent get stopped.

One command starts the demo: an agent stopped from signing off its own work, and the signed evidence file an inspector can check for themselves.

Designed against the rules your auditors already enforce.