Deepfield: a modular assessment platform

The situation

Most strategic assessment work ends in a slide deck that's stale within a quarter. A program office wants to know whether to fund a bet, how a competitor is likely to react, what the evidence actually says. What they get is a one-shot memo. When the world shifts, they pay again for a fresh one.

We built Deepfield because we wanted to stop selling memos.

What we built

Deepfield is a nine-stage assessment pipeline. You feed it a strategic question and it moves through nine modular packages, each with a defined input and output contract:

Intake parses the query into a formal assessment scope and frames it as a contest between actors.
Research runs an iterative gap-filling loop (Stanford STORM multi-perspective expansion, KAIR gap detection, a ReAct loop with coverage thresholds).
Extract runs a nine-agent extraction pipeline over retrieved sources, with Bayesian evidence filtering, a 54-domain source credibility database, and asymmetric thresholds for discovery versus confirmation.
Graph builds a persistent knowledge graph in FalkorDB with Louvain community detection, snapshot caching, and graph analytics.
Reason runs Monte Carlo Tree Search with UCB1 selection, a step-level process reward model, self-consistency verification, and adaptive compute budgets (10 / 50 / 200 rollouts).
Analyze runs thirteen-type asymmetry detection, Mason & Mitroff Strategic Assumption Surfacing and Testing, twelve-persona red team challenges, and signpost generation.
Decision runs a ten-stage course-of-action pipeline. Synthesis, effect chains, resource estimation, seven-dimensional risk analysis, Pareto scoring, Roger Martin's WWHTBT framing, real options, reversibility, resilience, decision briefs.
Wargame runs matrix game solving (Lemke-Howson, IPA for n-player games), BDI cognitive agents with operational code analysis, perception masks, and agent-based Monte Carlo simulation.
Output assembles the bundle. Graph visualizations, victory cards, course-of-action comparisons, sensitivity heatmaps, equilibrium diagrams, multi-format reports.

The whole thing runs end-to-end from a single command. Twenty-thousand-plus lines of production TypeScript. Over two thousand tests.

How it's defensible

Pick any recommendation Deepfield produces. You can trace it backward.

The recommendation came from a ranked course of action. That course came from a decision pipeline that scored it across six dimensions and seven risk categories. Those scores came from a reasoning tree whose rollouts used evidence from a knowledge graph. Each node in that graph came from extracted entities with a confidence score. Each entity came from specific source passages the extraction agents evaluated. Each source was retrieved by a research iteration that ran a specific query because a prior iteration flagged a coverage gap.

That chain is carried as W3C PROV-compliant lineage through every stage. It's not a nice-to-have we add when someone asks to see the work. It's how the system is built. Every module returns a Result<T,E>. If something fails, it fails explicitly. We don't let the pipeline fall back to templates or cached defaults to hide a broken call.

What it replaced

The default alternative is a consultancy engagement that delivers a PDF. Deepfield is not a PDF. It's the infrastructure that produces a fresh assessment when your inputs change, runs the same analysis repeatedly without re-paying for the analyst's learning curve, and lets you interrogate any conclusion down to the specific document that supports it.

What a similar engagement looks like

Deepfield is a platform we embed. A typical engagement runs 12 to 16 weeks. We configure the pipeline for your domain, ingest your sources and credibility priors, tune the reasoning prompts and scoring weights, and run the first real assessment together with your team. At the end you keep the running system, the knowledge graph, the code, and your data.

It's a fit when you have a recurring assessment need, your own sources, and a requirement that every conclusion be traceable back to the evidence that produced it. If your decision is one-shot and never repeats, Deepfield is too much infrastructure. If it's ongoing and the stakes are high, a platform you run yourself costs less over the first year than two consulting engagements and produces work you can actually defend.

Deepfield: a modular assessment platform

The situation

What we built

How it's defensible

What it replaced

What a similar engagement looks like

Making the case inside your organization?

More Work

Other systems we've shipped

DARPA program analysis sites: evidence you can audit

The Marshall archive: making a hidden corpus navigable

Georgetown Wicked Problems Lab: a classroom where AI assists but never decides

Initiate Contact

Ready to transform your
decision architecture?

Deepfield: a modular assessment platform

The situation

What we built

How it's defensible

What it replaced

What a similar engagement looks like

Making the case inside your organization?

More Work

Other systems we've shipped

DARPA program analysis sites: evidence you can audit

The Marshall archive: making a hidden corpus navigable

Georgetown Wicked Problems Lab: a classroom where AI assists but never decides

Initiate Contact

Ready to transform your decision architecture?

Ready to transform your
decision architecture?