The front office gets the headlines, but the middle office is where banks actually lose sleep. It is the floor between the traders and the back-office plumbing — the people who reconcile breaks, work the alert queues, and clear the sanctions hits before money moves. The work is high-volume, rules-bound, and chronically understaffed. It is exactly the work an AI agent does well, which is why the pilots keep arriving here first.

It is also exactly the work an examiner reads line by line after something goes wrong. So the question is not whether agents can do middle-office work — they can. The question is whether you can put one in that seat and still answer, on demand, who was allowed to do what, and prove the record was not edited afterwards.

The three jobs that pull agents in

Three middle-office functions account for most of the early appetite, and they share a shape: a large queue, a documented procedure, and a decision that eventually needs a signature.

Reconciliation. Matching a custodian's records against the bank's ledger, chasing the breaks, and drafting the journal entries to clear them. An agent can churn through thousands of breaks a night and propose the fixes. What it must not do is post its own correcting entry without a second pair of eyes — the adjustment that resolves a break is also the adjustment that can hide one.

AML alert triage. A transaction-monitoring system throws alerts; most are noise. An agent can gather the context, summarise the customer's history, and recommend close or escalate. But the recommendation to file a suspicious-activity report — or to dismiss an alert and leave no report — is a mandated human decision under the Bank Secrecy Act. An agent that quietly clears alerts is not efficiency; it is an unmonitored AML program. We go deeper on this in AML alert triage with AI agents.

Sanctions screening. A name pings against a watchlist. An agent can resolve the obvious false positives — the common name, the transposed birth date — at speed. The true hit, the one that blocks a payment or freezes a relationship, is the one decision you cannot let a model make alone, because clearing a real sanctions match in error is a strict-liability problem.

In all three, the pattern is identical: let the agent do the volume, but keep the consequential action behind a control a human owns.

Why your existing controls do not cover the agent

A bank already has these controls — for people. The reconciliation analyst cannot approve her own journal entry. The investigator who recommends closing an alert is not the one who signs off the SAR decision. This is the four-eyes principle, and the Wolfsberg Group names maker-checker as the control standard for exactly these workflows. NYDFS Part 504 then requires a senior officer to certify, annually, that the institution's transaction-monitoring program works.

The problem is that none of this machinery knows what an agent is. Your role matrix, your approval workflow, your segregation-of-duties rules — they are built around named employees with logins. Drop an agent into the queue and it typically inherits a service account, a broad set of permissions, and no place in the org chart. The agent that recommends and the "agent" that approves are often the same process, separated by nothing but a prompt instruction. That is not a separation an examiner recognises. It is a separation that exists only as long as the model behaves.

What an examiner is actually looking for

When a regulator examines a middle-office process, they are not impressed by throughput. They ask a small, durable set of questions, and they ask them about agents the same way they ask them about people:

Who was authorised to take this action, and on what date?
Was the person who prepared the work barred from approving it?
Where is the record, and can you prove it was not altered after the fact?

These questions are date-proof. In April 2026 the Federal Reserve issued SR 26-2, which replaced the old model-risk guidance and explicitly scoped agentic AI out. There is no supervisory template for agent controls yet — and no template means no safe harbor. The predicate rules underneath did not move: the BSA SAR obligation, the Part 504 certification, the four-eyes expectation. Examiners and litigation discovery still demand the evidence, with or without a rulebook written for machines. We unpack that gap in SR 26-2 and the no-safe-harbor problem.

Agents as accountable staff, not anonymous scripts

The way out is not to slow the agent down. It is to give the agent the same accountability structure you already impose on a human in that seat — a named identity, a defined role, hard limits, and a separation of duties enforced by the system rather than promised by a prompt.

That is the job of a control plane: a layer that sits between what an agent wants to do and what it is allowed to do. For a middle-office deployment it does four concrete things.

A named role, not a service account. The triage agent is the triage role — nothing more. It can read alerts, pull customer history, and write a recommendation. It cannot post a journal entry, file a SAR, or clear a true sanctions hit, because those doors were never granted to its role. Permissions are deny-by-default and versioned, so you can reconstruct exactly what the agent was allowed to do on any past date, with a record of who approved each grant.

A structural maker-checker split. The agent that prepares the work provably cannot be the one that approves it — not "should not," but cannot, enforced inside the run. The requester is barred from signing off its own request. This is the segregation of duties your quality and AML functions already run on people, applied to the machine.

An approval gate on the one-way doors. Filing a SAR, posting a correcting entry, clearing a real watchlist match — each parks the run and demands a human signature, and the gate can require a quorum of named approvers. The signer's reason is captured verbatim, so the sign-off carries its meaning, not just a timestamp.

An audit trail that survives discovery. Every recommendation, model call, and approval lands in an append-only, hash-chained, cryptographically signed ledger. Change one record and the chain visibly breaks. The export verifies offline, against a published spec, by someone who does not trust the vendor — which is the only kind of evidence worth having when the question arrives years later.

Middle-office step	Agent does	Human still signs
Reconciliation	Match breaks, draft entries	Post the correcting entry
AML triage	Gather context, recommend	File or decline the SAR
Sanctions screening	Clear false positives	Clear a true watchlist hit

The wedge, not the whole bank

Nobody should hand an agent the trading book on day one. The middle office is the right wedge precisely because it is bounded: the procedures are written, the decisions that need a human are well known, and the volume is painful enough to justify the change. Get the control structure right here — roles, limits, maker-checker, a clean audit export — and the same pattern extends outward as trust is earned. That progression, from a single contained queue to production, is the subject of from pilot to production.

The bank that ships middle-office agents without this structure is not ahead. It is accumulating decisions no one can account for, in the one part of the institution an examiner is guaranteed to read. The bank that ships them with it gets the throughput and keeps the answer to the only question that matters when the regulator calls.

See how it works, or book a demo to watch an agent get blocked from approving its own work — live.

AI agents in the bank middle office

The three jobs that pull agents in

Why your existing controls do not cover the agent

What an examiner is actually looking for

Agents as accountable staff, not anonymous scripts

The wedge, not the whole bank

MakerChecker for financial services

AML alert triage with AI agents

SR 26-2, AI agents, and no safe harbor

Getting AI agents from pilot to production

See an agent get stopped.