All Work
Research Intelligence ToolAndrew W. Marshall Foundation

The Marshall archive: making a hidden corpus navigable

For the Andrew W. Marshall Foundation, Syntheos built a research tool that turns decades of scattered documents, oral histories, and institutional memory into a navigable knowledge graph. Every node and edge is labeled as verified archive material or AI-generated inference, so researchers always know where knowledge ends and hypothesis begins.

The Marshall archive: making a hidden corpus navigable

The situation

Andrew Marshall ran the Office of Net Assessment for 42 years. The intellectual tradition around him (game theory, competitive strategies, the Revolution in Military Affairs debate, decades of thinking about long competitions) lives in scattered documents, oral histories, and the heads of people who worked with him. A graduate student writing about strategic assessment today has no way to ask the archive a question. A researcher building on Marshall-style net assessment has to know where to look before they can look.

The Andrew W. Marshall Foundation wanted to fix that.

What we built

A research console. The tool loads the archive as a knowledge graph. Documents, people, organizations, concepts, events, belief systems, and misconceptions all live as typed nodes with typed relationships between them. The graph is seeded from the archive itself via a Python ingestion pipeline that extracts entities and relations per document, computes publication years and temporal coverage ranges, builds DATED_BY provenance edges, detects communities, deduplicates on embeddings, and exports tiered Arrow files for the frontend.

A researcher opens the console and asks a question in a chat panel. A hybrid retrieval pipeline runs vector similarity search, full-text search, and a two-hop graph expansion. The three result sets get blended with Reciprocal Rank Fusion (k=60). An LLM reranker orders the fused results. A community-context pass pulls in the relevant Louvain cluster summaries. A multi-hop gap-fill pass adds any entities the first pass missed. The final context goes to a streaming answer with numbered citations.

Alongside the chat, a right-hand panel shows evidence cards synced to the reader's scroll position in the answer. Click a citation badge in the response and a popover opens with the document title, the exact chunk of text that grounds the claim, and a "view passage" button that pins the full citation card. A timeline strip at the bottom filters the graph and chat by era: Early (1949-1973), ONA Founding (1973-1989), RMA (1990-2001), GWOT (2001-2015).

Researchers can also save an investigative thread (graph state plus conversation plus applied filters) and return to it later.

How it's defensible

Every node and every edge in the graph carries a label. Verified means the content came from an archive source. Inferred means the AI generated it during reasoning. Verified nodes render as filled circles. Inferred nodes render as dashed outlines. The user always sees which is which.

The same discipline runs through the chat. Every assistant response uses a numbered citation format resolved through a CitationRegistry. Clicking a badge opens a popover with the source document and passage. Inferred claims get a separate visual tag ([Inference: ...]) so a careful reader can distinguish what the archive says from what the model extrapolated. We also run Marshall-style net assessment diagnostics (blind spot detection, asymmetry surfacing, second-order effect tracing) with Zod-validated structured outputs so the diagnostic results are themselves auditable.

None of this requires the user to trust the model. If the answer says Marshall developed his RMA thesis in working papers with Wohlstetter during the late 1970s, the user can click and read the working paper.

What it replaced

A box of PDFs and an institutional memory that lived in a few people's heads and was eroding.

What a similar engagement looks like

Eight to fourteen weeks. We need a domain corpus (documents, oral histories, structured metadata), enough subject-matter-expert time to review the entity schema and name the relationships that matter, and a hosting target. You get a deployed research console, the ingestion pipeline, the graph itself, and documentation for how to add new material over time.

It's a fit for foundations, research institutes, agencies, and university programs that hold a specialist corpus and want researchers to think with it rather than just read it. It's not a fit if your corpus is already well-indexed by mainstream search. That problem is solved elsewhere.

For internal champions

Making the case inside your organization?

We've written a two-page business case for this engagement shape. Executive summary, problem statement, deliverables, risks, success metrics, investment range. Read it in the browser or print it to PDF and forward.

Read the business case

Initiate Contact

Ready to transform your decision architecture?

Tell us about the decision you're trying to improve. We'll schedule a briefing with our principals to understand your environment and explore a potential fit.

Schedule a Briefing