A clinical trial database is not a spreadsheet. It is the evidence that will decide whether a drug is approved, and every value in it has a provenance: who entered it, who questioned it, who changed it, and why. When a sponsor files for approval, that database — and the audit trail behind it — is what an inspector opens first. A clinical data manager does not protect numbers. They protect the chain of custody on every number.
So when an AI agent enters that workflow, the question is not "can it read a case report form faster than a person?" It plainly can. The question is which of the data manager's actions an agent may perform, and which it may only propose — because in clinical data management, a surprising number of routine-looking actions actually change the trial record.
Clinical data management, or CDM, is the discipline of collecting, cleaning, and locking the data that comes off a trial. It runs under Good Clinical Practice (GCP) — the international standard, defined through the ICH guidelines, for how trials are conducted and how their data is handled. GCP has one non-negotiable demand of any system, automated or not: the data must be attributable, legible, contemporaneous, original, and accurate, and you must be able to prove it later.
What an agent should do: draft, code, reconcile
Most of CDM is high-volume comparison and assembly — the kind of work a model is genuinely good at, and a person is genuinely tired of.
Query generation is the clearest case. When data fails an edit check — a lab value outside range, a visit date before enrollment, a dose that contradicts the protocol — someone has to write a query back to the site asking them to confirm or correct it. An agent can read the discrepancy, draft a clear, protocol-aware query, and route it. That is preparation. The query is a question, not a change to the record, so a well-grounded agent drafting it is honest automation.
Medical coding is the next. Verbatim terms reported by sites — "bad headache," "stomach was upset" — must be mapped to standardized dictionary terms so that events group correctly across the whole trial. An agent can propose the coded term and cite why. But coding is where judgement starts to bite: the same verbatim can map to materially different terms, and the choice affects how safety signals aggregate. The agent proposes the code. It does not get to be the actor who confirms it into the locked record.
Reconciliation rounds it out. Data from different sources — the clinical database, the central lab, the safety database — has to agree. Serious adverse events recorded in the trial database must match what the pharmacovigilance system holds, a cross-check that ties directly into pharmacovigilance and AI agents. An agent can find the mismatches, line them up, and propose the resolution far faster than a human scrolling two systems. Finding the gap is throughput. Resolving it changes the record.
The pattern is consistent. The agent drafts the query, proposes the code, surfaces the mismatch. It owns the volume. It does not own the decision.
What stays human: anything that changes the trial record
The line in CDM is sharper than it first looks, because the actions that alter the trial record are not always the dramatic ones.
- Accepting or applying a data change. A correction to a value, a confirmed query response written back to the database — these change what the record says.
- Confirming a medical code. The moment a coded term is locked in, it shapes the safety analysis. That confirmation is a judgement, made by a coder or medical monitor.
- Resolving a reconciliation discrepancy. Deciding which source is correct, or that a difference is acceptable, is a clinical and data-integrity call.
- Database lock. The point of no easy return. After lock, the data is frozen for analysis. No agent signs this.
Each of these is a decision a regulated person is expected to own. A medical monitor brings clinical context the data does not carry; a data manager carries accountability for the integrity of the database as a whole. Handing those calls to a model that cannot explain itself in an inspector's terms is not speed. It is an undocumented decision sitting in a dataset that will eventually be inspected.
The control that makes the split structural
"A human reviews the agent's work" is not a control. Anyone can glance at a proposed code and click accept; the click proves nothing about who actually judged it, or whether the same automated actor both proposed and confirmed it.
The control has to guarantee, structurally, that the agent which drafted a query or proposed a code cannot be the actor who applies the change to the trial record — and that the human who does apply it leaves a signature carrying its meaning. This is the maker-checker principle: the four-eye separation that quality functions have run on for decades, enforced here against a machine. It is the same separation 21 CFR 211.22 demands of the quality unit in pharma — the people doing the work and the people approving it cannot be the same.
| Step | Actor | Control |
|---|---|---|
| Discrepancy detection, query drafting | Agent | Deny-by-default, versioned skill grant |
| Coding proposal | Agent | Recorded, reversible, not a final entry |
| Reconciliation matching | Agent | Surfaces mismatches; cannot resolve them |
| Confirming a code or data change | Data manager / monitor | Approval gate; requester cannot self-approve |
| Database lock | Authorized human | Signed, hash-chained audit entry |
The structural part is what separates this from a policy memo. It is not enough to say the agent should stop at a proposal. The same agent must provably be unable to act as both maker of the proposed change and checker that applies it on a single run. When it tries, the attempt is refused — and the refusal lands in the log, which is frequently the exact evidence an inspector wants to see. That is the job of an AI agent control plane: the limit lives in enforced, recorded policy, not in the prompt.
Why the record is the whole point
A clinical database is, in the end, a chain of decisions a sponsor will be asked to defend during inspection. Who proposed this code? Who confirmed it, on what date, under which version of the coding conventions? Who resolved this SAE mismatch, and on what reasoning? Was any value changed after the fact without a trace?
GCP and 21 CFR Part 11 turn those questions into hard requirements. §11.10(e) demands a secure, computer-generated, time-stamped audit trail that does not obscure prior values — every agent action and every human change captured, ordered, and immutable. §11.50 requires that a signature carry its meaning: review, approval, responsibility, not a green tick. §11.70 binds that signature to the specific record so it cannot be lifted and reused. A CDM workflow built on agents has to satisfy all three for the agent's actions and the human's. We work through that obligation in Part 11 and AI agents.
A control plane produces this evidence as a by-product of doing the work. Every model call, every grant, every gate, every signature lands in an append-only, hash-chained, cryptographically signed ledger. Change one entry and the chain visibly breaks. The export verifies offline, against an open spec, with no access to your systems — the form of proof an inspector trusts, covered in detail in tamper-evident audit logs for AI agents.
The honest version of the pitch
AI agents will not replace the clinical data manager or the medical monitor, and any vendor implying otherwise is selling the part of the job that carries the liability. What agents replace is the grind — the hours spent drafting queries, hunting mismatches, and proposing codes before a qualified person ever has to decide anything.
That trade is worth taking only if the line between machine throughput and human judgement is enforced and recorded, not promised. Get that right and the agent comes out of the pilot it has been stuck in, into a workflow you can put in front of an inspector — because every decision that touched the trial record has a name on it, and a record that holds.
See how it works, or book a demo to watch an agent get blocked from approving its own work — live.