Every model risk management (MRM) function in a US bank knows the four disciplines by heart: inventory the model, validate it before it goes live, monitor it while it runs, and control what it is allowed to drive. For fifteen years that practice had a template to grade itself against. As of April 2026, for agentic AI, it does not.
SR 26-2 retired SR 11-7 and scoped agentic AI — an autonomous agent that books entries, clears alerts, or assembles filings on a human's behalf — out of model-risk guidance. The MRM team's first reaction is often to file agents under "not our problem." That is the wrong lesson. The checklist went away; the discipline it described did not, and neither did the exposure it managed.
The harder truth: the four MRM disciplines map onto agents almost perfectly. What breaks is the evidence. A model is a static artifact you can pin down, version, and re-run on a frozen dataset. An agent is an actor that decides what to do next. You cannot validate an actor the way you validate a regression — you constrain it and record it. That shift, from validating a thing to proving control over an actor, is the whole story.
Inventory: an agent is a principal, not a row
The first MRM discipline is the model inventory. You cannot govern what you have not catalogued, and an unregistered model is the finding every examiner opens with.
Agents make this harder in a specific way. A model is a discrete object with a name, an owner, and a version. An agent is a moving thing: it calls tools, spawns sub-tasks, swaps the model underneath it, gets re-prompted on a Friday afternoon. A row in a spreadsheet captures none of that. By the time you audit it, the entry describes something gone.
The fix is to make every agent a named principal — an identity that acts as something, holds exactly one role at a time, and does nothing anonymously. The inventory stops being a list you maintain by hand and becomes a property of the system: if an agent acted, it acted as a registered identity, because the control layer refuses to run an unnamed actor. You cannot have a shadow agent any more than you can badge a shadow employee through the door.
Validation: you cannot pre-test an actor, so you bound it
Validation is where the analogy strains hardest. SR 11-7 validation meant independent review: check the conceptual soundness, test against held-out data, document the limitations, sign off before production. That works because a model's behaviour is, in principle, reproducible — feed it the same inputs and it returns the same output.
An agent is not reproducible in that sense. Its output depends on the state of the world when it runs, on tool responses, on a context window you did not write. You can red-team it, and you should, but you cannot enumerate the inputs it will face in production, so you cannot certify it the way you certify a credit model. A validation report that says "we tested it and it behaves" promises what the artifact cannot deliver.
So validation changes shape. Instead of proving the agent will always behave, you prove the boundary will always hold — that whatever the agent attempts, it can only act within an explicitly granted set of capabilities, and the consequential steps it cannot take alone. That is deny-by-default permissions: the agent starts with zero capability, every door it can open was opened on the record by a named person, and every other door stays shut. You are not validating that the actor is trustworthy. You are validating that the cage is sound — and the cage, unlike the actor, is testable.
Ongoing monitoring: the run is the record
The third discipline is ongoing monitoring. For a traditional model this means performance tracking, drift detection, threshold breaches. For an agent it means something more basic and more urgent: a continuous, per-decision account of what the actor did, against what it was permitted to do.
This is where most agent deployments are silently exposed. The agent runs, produces useful output, and leaves behind logs the team's own administrators can edit — or no usable record at all. Monitoring an agent is hard because the events worth monitoring are authority events — who acted, under which grant, with whose approval — and most agent stacks do not capture authority.
A control layer between the agent's intent and the real world turns every action, model call, and approval into a logged event at the moment it happens. The run is the monitoring record. And because that record is hash-chained and signed, it is not merely available, it is tamper-evident: change one entry and the chain visibly breaks, and a third party can verify the export offline without trusting the institution that produced it. Monitoring you cannot prove is intact is not monitoring an examiner will credit.
Controls: segregation of duties, enforced not flagged
The fourth discipline is controls — the limits that stop a model from driving a decision it should not. In MRM this has always leaned on human governance: an analyst proposes, a reviewer approves, the two are different people. The control is segregation of duties, and it predates AI by centuries.
The mistake is to assume an agent inherits that control automatically. It does not. An agent that prepares a journal entry and then posts it has collapsed maker and checker into one actor. An agent that clears the alert it triaged has approved its own work. The separation has to be imposed from outside, structurally, so the same agent cannot be maker and checker on one run — not "should not," but provably cannot, refused at runtime with the refusal landing in the log.
For one-way doors — moving money, filing a suspicious-activity report, certifying a program — the control is an approval gate: the run parks and demands a named human signature, a quorum where required, and the requester is barred from approving their own request. These are not novel inventions. They are the Wolfsberg four-eye standard and the human decision the Bank Secrecy Act reserves for a person — the control that already governed people, held against a machine.
The defensible substitute for a missing template
Here is the position SR 26-2 actually leaves the MRM function in. There is no agreed standard for what "an agent under control" looks like, so you will not get to argue you met it. You will be judged after the fact — by an examiner improvising, or by counsel in discovery — against what a reasonable institution should have done. We work through that exposure in SR 26-2 and no safe harbor.
When no one will hand you a template, the defensible substitute is not a thicker policy document. It is a verifiable control paired with the evidence it held. The four MRM disciplines, translated for agents, produce exactly that:
| MRM discipline | For a model | For an agent |
|---|---|---|
| Inventory | A versioned artifact in a register | A named principal that cannot act anonymously |
| Validation | Independent review of behaviour | Proof the capability boundary holds |
| Monitoring | Performance and drift tracking | A tamper-evident, per-decision run record |
| Controls | Governance over how the model is used | Segregation of duties enforced at runtime |
Whatever supervisory template eventually arrives, it will ask the same question every version of MRM has always asked: can you prove what this actor was allowed to do, that a person owned the consequential decision, and that the record was not altered? You do not need to predict the future rule. You need to be holding the evidence any version of it would expect — which is also the evidence the predicate rules, indifferent to whether SR 26-2 calls your agent a model, demand right now.
The MRM team that waits for the agentic-AI template is not being prudent. It is running agents through an unmonitored window, gambling that nobody asks for the record first. The discipline you already practise tells you not to take that bet.
See how it works, or book a demo to watch an agent get blocked from approving its own work — live.