Once a year, a senior officer at a New York-regulated bank signs a document saying the institution's transaction-monitoring program works. Not "we believe it works" — a certification, filed with the Department of Financial Services, that the program complies with NYDFS Part 504. The signer is personally accountable. There is no committee to hide behind and no footnote that says the software did it.

That signature was already a serious thing to give when the monitoring program was rules, analysts, and a case-management queue. Now an AI agent is reading the alerts, clearing the false positives, and escalating what looks real. The question Part 504 forces — but never anticipated — is blunt: when you sign the annual certification, what exactly are you certifying, and what evidence do you have that it is true?

What Part 504 actually demands

Part 504 (23 NYCRR 504) requires every covered institution to maintain a transaction-monitoring and filtering program reasonably designed to detect potential money-laundering and to surface activity for a suspicious-activity report. The rule then adds the part that keeps compliance officers awake: an annual certification, by a senior officer or the board, attesting to that program.

The certification is not a checkbox. It is a statement that the program's design, testing, governance, and decisions hold up. If they do not — if the program was quietly degraded, or nobody can show how a given alert was dispositioned — the officer who signed is the one who answers for it. The whole regime is built to make sure a named human owns the monitoring program's integrity.

None of that vocabulary mentions AI, because the rule predates agentic AI by a decade. That is exactly the problem. The obligation is date-proof; the technology underneath it is not.

Where the agent quietly absorbs the decision

Drop an AI agent into alert triage and the work changes shape fast. The agent pulls the alert, reviews the customer history, weighs the transaction pattern, and either closes it as a false positive or escalates it for human review. It works through the queue without tiring and without pausing.

Most of those dispositions are sound. That is not the risk. The risk is that every "close — no further action" is a decision the program made on the officer's behalf, and at certification time the officer has to stand behind all of them. A spreadsheet of outcomes is not enough. The supervisor — and, if it ever comes to it, a courtroom — will ask the harder questions:

Which agent disposed of this alert, acting under what authority on that date?
Could that same agent both clear the alert and sign off on its own clearance?
When a case crossed the line into a suspicious-activity-report filing — the kind of AML alert decision agents now triage — did a named human actually make that call?
Has any record been altered since?

A monitoring stack that cannot answer those does not give the certifying officer support. It gives them exposure.

The control standard predates the agent

There is nothing new about the underlying control. The Wolfsberg Group — the standard-setting body for financial-crime controls — names the maker-checker, or four-eye, principle as the expected practice: the person who prepares a decision is not the person who approves it. BSA suspicious-activity-report filing is, by design, a mandated human decision. Part 504's certification is the annual proof that those controls are real.

An AI agent does not get a waiver from any of this. If anything, it raises the bar, because an agent can act faster, more uniformly, and more silently than a team of analysts. The control standard is the same; the actor changed. The honest way to govern that actor is to apply the same maker-checker discipline to the machine that you already apply to people — structurally, not on the honor system. We unpack the principle itself in the four-eyes principle for AI workflows.

It is also worth being clear about what does not close this gap. Content guardrails — the tools that ask "is this output toxic, biased, or leaking data?" — are useful and worth running. But they answer a different question. Part 504 is not asking whether the agent's text was safe. It is asking whether the actor was authorized, whether duties were segregated, and whether the decision is attributable. That is a governance question, not a content one.

What the certifying officer actually needs

Strip the certification down to its load-bearing parts and an AI-driven program has to produce four things on demand.

The officer must show	What backs it
The agent only did what it was permitted to	Deny-by-default, versioned authority for every action
It could not approve its own work	Segregation of duties enforced inside each run
A human owned every reportable call	An approval gate the requester cannot satisfy alone
The record is intact	A tamper-evident, independently verifiable audit trail

This is precisely what an agent control plane provides, and why it is not optional once agents touch a regulated decision. Authority is granted to a role, deny-by-default and versioned, so you can reconstruct exactly what an agent was allowed to do on any past date. Segregation of duties is structural: the same agent provably cannot be both maker and checker on a single alert. The escalation to a report filing is an approval gate that bars the requester from approving its own request and captures the signer's reason verbatim.

And the whole sequence lands in a hash-chained, cryptographically signed audit log. Alter one record and the chain visibly breaks. The export can be verified offline, by someone who does not trust your vendor and has no access to your systems — which is the only kind of evidence a supervisor or opposing counsel respects. The mechanics are in tamper-evident audit logs for AI agents.

From a signature of faith to a signature of evidence

Here is the practical shift. Without this layer, the annual Part 504 certification becomes an act of faith: the officer signs because the program seems to be working and nobody has flagged a problem. With it, the same signature rests on an evidence export — a record of which agent did what, under what authority, with which human in the loop, provably unaltered.

That distinction matters most on the worst day, not the average one. When an examiner pulls a thread on a cleared alert that should have been filed, the question is never "did your AI mean well." It is "show me who was authorized to close this, show me a human owned the call where the rule required one, and show me the record has not been touched." A program built on a control plane can answer in minutes. A program built on prompts and good intentions cannot answer at all.

The certification was always meant to make a human personally answerable for the monitoring program. Agentic AI does not dissolve that accountability — it concentrates it. The officer who signs deserves a program that can prove its own work.

See how it works, or book a demo to watch an agent get blocked from approving its own work — live.

NYDFS Part 504 and AI agent monitoring

What Part 504 actually demands

Where the agent quietly absorbs the decision

The control standard predates the agent

What the certifying officer actually needs

From a signature of faith to a signature of evidence

MakerChecker for financial services

AML alert triage with AI agents

SR 26-2, AI agents, and no safe harbor

Tamper-evident audit logs for AI agents

See an agent get stopped.