Protocol Builder: a first draft the IRB can actually use

The situation

Researchers preparing human subjects research protocols spend hours pulling information out of drug pamphlets, study plans, and prior-study reports to populate template fields. The process is slow. It's error-prone. Different researchers interpret the same source material differently and end up with different protocols for the same study. An IRB reviewing those protocols has to chase inconsistencies that came from the input stage, not the science.

Our client wanted a first draft that a researcher could start from, not a finished protocol that a researcher had to fight with.

What we built

A protocol auto-population API. The caller uploads a set of context documents and selects a protocol template. The system parses the documents, runs LLM extraction with constrained structured outputs, and returns a draft protocol with each field populated from the source material. Each field comes back with:

A confidence score.
A citation back to the specific source passage that supported the extraction.
A flag if the source material was insufficient and the field is a gap rather than a draft.

The extraction uses OpenAI's structured output API with Pydantic schemas, so every field conforms to a typed contract. The model cannot invent a field shape. When information is missing, the schema forces an explicit gap marker rather than letting the model paper over the hole.

How it's defensible

Three things carry the defensibility load.

First, structured outputs with constrained decoding. Every extracted field is typed. Every field is either a draft with a confidence score and a source passage, or an explicit gap. There is no unstructured middle ground where the model hides an unsupported claim.

Second, source attribution on every field. When a reviewer reads a drafted "inclusion criteria" field, they see the specific page and paragraph in the source documents that the draft came from. Accepting the draft is a judgment call with all the evidence in front of them. Rejecting it is a one-click action, not an investigation.

Third, explicit gap surfacing. The system is designed so that "I don't have enough information to draft this field" is a normal output, not a failure. A reviewer would much rather see ten fields drafted and five marked as gaps than see fifteen fields all drafted with the same confidence, because the second case hides which fields they need to scrutinize.

The point of all three is that the human reviewer stays in charge. The system drafts. The human decides. A first draft a reviewer can trust beats a finished draft they can't.

What it replaced

Manual extraction and reformatting, a process that took a researcher hours per protocol and produced inconsistent outputs across researchers and studies.

What a similar engagement looks like

8 to 12 weeks to build a Protocol Builder for a new template library. We need the templates, a reference set of completed protocols for schema design, and access to typical source document shapes. You get the deployed API, the template schemas, the extraction prompts tuned for the domain, and the confidence-scoring rubric.

It's a fit for IRBs, compliance teams, grant offices, and any setting where a structured document has to be drafted from a set of source materials, and a human reviewer needs to see exactly where every piece of the draft came from.

Protocol Builder: a first draft the IRB can actually use

The situation

What we built

How it's defensible

What it replaced

What a similar engagement looks like

Making the case inside your organization?

More Work

Other systems we've shipped

Deepfield: a modular assessment platform

DARPA program analysis sites: evidence you can audit

The Marshall archive: making a hidden corpus navigable

Initiate Contact

Ready to transform your
decision architecture?

Protocol Builder: a first draft the IRB can actually use

The situation

What we built

How it's defensible

What it replaced

What a similar engagement looks like

Making the case inside your organization?

More Work

Other systems we've shipped

Deepfield: a modular assessment platform

DARPA program analysis sites: evidence you can audit

The Marshall archive: making a hidden corpus navigable

Initiate Contact

Ready to transform your decision architecture?

Ready to transform your
decision architecture?