When an internal auditor or a regulator sits down to examine an AI agent, they do not ask the questions an engineer expects. They do not care how the model was trained, what its accuracy is, or how clever the prompt is. They ask an older, narrower set — the same questions they have asked of human employees for decades. What was this thing allowed to do? Who decided that? Who signed off on each consequential action? And can you prove none of this was edited after the fact?
Most teams cannot answer those questions about their agents — not because the answers don't exist, but because they were never kept as a record. They live in a deployment, a prompt, a Slack thread, a commit history. None of that is evidence. This is a practical guide to the four questions you will be asked, and the artefact you need ready for each.
Question 1: What was this agent permitted to do?
The first question is about authority, not behaviour. Before anyone looks at what the agent did, they want to know what it was allowed to do. Note the tense: the question is asked in the past — on the third of March, what could this agent touch?
This is where most audits stall. The team can show what the agent can do today — the current tool list, the current config — but the agent in question ran three months and a dozen deployments ago. "Let me check what we shipped that week" is an admission that permissions were never kept as a record.
What you need ready is a grant ledger: a permanent, time-stamped list of every capability the agent ever held, queryable for any past date. Deny-by-default — the agent starts with zero capability and gains each one only because someone explicitly opened that door — is what makes the ledger meaningful, because the list is exhaustive rather than a subset of some larger implicit grant. We unpack the ledger in deny-by-default permissions.
Question 2: Who granted that authority?
The second question is about accountability for the boundary itself. It is not enough to show what the agent could do; the auditor wants to know who decided it could. In a regulated function, every capability traces back to a named human.
Here the common failure is that the people who set an agent's permissions are invisible. An engineer added a sixth tool in a pull request; another approved the merge; it shipped. None of that surfaces as "Person X granted this agent the ability to release batches on this date." It is buried in version control, legible only to someone who reads diffs — which an examiner does not.
The artefact you need is attribution baked into the grant. Every entry in the grant ledger names the human who approved it, the way a change-control record in a quality system names whoever authorised the change. Versioning does not just let you reconstruct the past; each version carries the signature of whoever created it. A boundary nobody is named for is a boundary nobody is accountable for.
Question 3: Who approved each consequential decision?
The first two questions are about standing authority. The third is about individual actions — the irreversible ones. Releasing a drug batch, filing a suspicious-activity report, pushing a configuration to live medical devices: these are one-way doors, and the auditor wants the name and reasoning behind every one.
Two things commonly go wrong. The first is that there was no human in the loop — the agent acted alone on a decision a regulation reserves for a person. The second is subtler: there was a human, but the record does not prove it. A "true" in a database, an automated rubber-stamp, an approval the agent could plausibly have generated itself. Under examination, an approval you cannot distinguish from a self-approval is worth nothing.
What you need is a record of each gated decision that captures four things: that the run stopped and waited, the name of the human who signed, that the signer was not the requester, and the signer's stated reason in their own words. The last point matters more than it looks. 21 CFR §11.50 requires that an electronic signature carry the meaning of the signing — approval, review, responsibility — not just the fact of a click. A signature without a reason is a timestamp; with one, an attestation.
Behind this sits segregation of duties: the agent that prepared the work must be provably incapable of approving it. Not discouraged — incapable. That is the difference between a control an auditor relies on and a policy taken on trust.
Question 4: Can you prove the record is intact?
The fourth question quietly decides whether the first three mattered. Suppose you produce a clean grant ledger, full attribution, and a signed approval for every gated decision. The auditor asks: how do I know none of this was edited once you knew I was coming?
If your answer is "our logs are secure" or "only administrators can write to it," you have failed. An administrator with write access is exactly the threat the question probes. A standard log — even one feeding a SIEM, the log collector most organisations run — records what a system believes happened. It does not prove the record is unchanged. Those are different claims, and the auditor holds you to the second.
The artefact that answers this is a tamper-evident audit trail: each entry cryptographically chained to the one before it, so altering any record breaks the chain visibly, and the whole export signed so a forgery is detectable. Better still, an export a third party can verify offline, on their own machine — because verification you control is not verification an examiner accepts. We go deep on this in tamper-evident audit logs for AI agents.
The audit checklist, in one place
The four questions map onto four artefacts. Produce all four for an arbitrary past run and you survive an examination. Fall short and you rely on the goodwill of the examiner.
| What they ask | What you must produce |
|---|---|
| What was it permitted to do? | A grant ledger, queryable for any past date |
| Who granted that? | Attribution naming the approver on each grant |
| Who approved each decision? | Signed gate records, with reason, requester barred |
| Is the record intact? | A hash-chained, signed, offline-verifiable export |
Notice what is not on this list: model accuracy, prompt quality, benchmark scores. Those decide whether the agent is good; they say nothing about whether it is auditable. An accurate agent with no record is a liability; a modest agent with a complete record is defensible. And order matters: an auditor who finds clean answers to the first three but a broken chain on the fourth treats the first three as unproven. Integrity is the condition on which the rest are admissible.
Why this is harder than it sounds — and why now
The reason most teams cannot answer these questions is that they instrumented their agents for debugging, not for evidence. Traces and dashboards help engineers understand failures. They are not built to satisfy a hostile reader who assumes the record was doctored.
In April 2026, US supervisors scoped agentic AI out of the main model-risk guidance — there is no purpose-built supervisory template for it yet. No template does not mean no scrutiny; it means no safe harbour. The predicate rules that govern what a human in the seat must do — segregation of duties, recorded-meaning signatures, intact audit trails — never moved, and discovery does not wait for new guidance. We argue that in the case for why now.
None of these four artefacts is exotic. They are the same controls regulators have demanded of people for a century, produced as a byproduct of normal operation rather than scrambled together after the subpoena arrives. That is what an agent control plane exists to do: turn every run into a record that answers the four questions before they are asked.
MakerChecker produces all four artefacts automatically. See how it works, or book a demo to watch an agent get blocked from approving its own work — live.