A guardrail and a governance control look like the same thing in a slide deck. Both promise to stop an AI agent from doing harm. But they answer two different questions, and a regulated team that buys one believing it has the other is exposed in a way no demo will reveal.

A guardrail asks: is this content dangerous? It inspects what the model reads and writes, prompts, completions, tool inputs, and blocks the toxic, the leaked, the jailbroken, the off-policy. A governance control asks a question the content never answers: is this actor authorized? Not "does this output look safe," but "is this agent, in this role, permitted to take this action at all, and who said so."

Those are not two flavors of the same product. They sit on different axes. And the gap between them is exactly where a regulated agent gets you in trouble.

The clean output that should never have been sent

Picture an agent in a GMP quality operation. It assesses a manufacturing deviation overnight and proposes a disposition for the affected batch. A guardrail layer, Galileo, Cisco's agent controls, any of the strong tools in that category, watches its inputs and outputs. No prompt injection. No leaked batch records. No abusive language. Every check passes.

The agent then releases the batch to market.

The instruction was clean. Well-formed. Perfectly polite. It contained nothing a content filter is built to catch, because the problem was never the content. The problem was authority. This agent was scoped to assess and quarantine. It was never authorized to release a batch. No guardrail asks that question, because a guardrail reads the message, not the mandate.

This is the failure mode that matters: not the agent that says something offensive, but the agent that does something correct-looking and entirely outside its remit. The output passes every test and is still a control breach.

Two axes, not two tiers

It helps to stop thinking of these as "basic" versus "advanced" safety. They guard different things.

	Guardrails	Governance
Question	Is this content dangerous?	Is this actor authorized?
Watches	Prompts, outputs, tool inputs	Identity, role, action, approval
Catches	Toxicity, leaks, injection, off-policy text	Self-approval, out-of-scope actions, missing sign-off
Evidence it leaves	Flagged content	A signed, replayable record of who was allowed to do what
Regulator's question	"Did it say something harmful?"	"Prove what it was allowed to do."

A guardrail can be flawless and a governance failure can still occur, because the agent was never asking a dangerous question, it was taking an unauthorized action. The reverse is also true: a perfectly authorized agent can still be fed a poisoned prompt that a guardrail catches. Each covers the other's blind spot. They are complementary, and a serious deployment runs both.

Why "is the actor authorized?" is the harder problem

Content is in front of you. You can read it, score it, filter it. Authority is not in the content at all, it is a fact about the world that has to be established, enforced, and recorded outside the model.

To answer "is this actor authorized," you need machinery a content filter does not have:

Identity. The agent has to be a named principal holding exactly one role, so the action can be attributed to someone, not to an anonymous process.
A grant. The role has to have been explicitly given the capability to take this action, and everything not granted has to be denied by default, not permitted by silence. We make the case for that posture in deny-by-default permissions.
Segregation of duties. The agent that prepared a piece of work cannot be the one that approves it, not as a guideline, but enforced inside the run so it is structurally impossible.
An approval gate. For one-way doors, releasing a batch, filing a report, pushing to production, the action parks and waits for a named human signature, and the requester is barred from signing their own request.
A record. Every action, grant, and refusal lands in a tamper-evident ledger that a third party can verify, so "it was authorized" is a provable claim and not a recollection.

None of that lives in the text stream. A guardrail reads the message; governance governs the actor. This is the work a control plane does, and the reason it has to sit beside the agent rather than inside its prompt, the boundary has to hold even when the model is swapped, re-prompted, or jailbroken.

What the rules actually demand

The distinction is not academic. The control standards regulated industries already enforce are governance standards, not content standards.

EU GMP Annex 16 names the Qualified Person as the sole party who may certify a batch for release. That is a statement about authority: the person who prepares the assessment cannot be the person who certifies. In the same vein, 21 CFR §211.22 makes the quality unit's segregation of duties a structural requirement. Determining the seriousness of an adverse-event case under 21 CFR §314.80 is a mandated human decision, and 21 CFR §11.50 requires that an electronic signature carry the recorded meaning of the act it authorizes. None of these ask whether a message was toxic. All of them ask who was allowed to act, and whether a person signed.

A guardrail cannot satisfy any of them, because they are not questions about content. They are questions about who held the authority and whether the record proves it.

How they compose

The right mental model is layers, not rivals. Run a guardrail to keep dangerous content out of the model's inputs and outputs. Run governance to keep unauthorized actions out of the world. A poisoned prompt is the guardrail's job. A self-approved batch release is governance's job. Neither covers the other, and shipping with only one is shipping with a known gap.

When teams move agents out of pilot and into production, this is the layer that is usually missing, the pilot had guardrails, and the production deployment needed authority controls it never built. We walk through that transition in from pilot to production.

So when a vendor tells you their tool makes your agents "safe," ask which question it answers. If it reads prompts and outputs, it is a guardrail, a useful one, and you should keep it. If it can prove that an agent was barred from approving its own work, and produce a record an examiner can verify offline, it is governance. You need both, and they are not the same purchase.

See how it works, or book a demo to watch an agent pass every content check and still get blocked from approving its own work, live.

AI agent governance vs guardrails

The clean output that should never have been sent

Two axes, not two tiers

Why "is the actor authorized?" is the harder problem

What the rules actually demand

How they compose

How MakerChecker works, the six primitives

What is an AI agent control plane?

Getting AI agents from pilot to production

Deny-by-default permissions for AI agents

See an agent get stopped.