AI Field Detection Pipeline

DocuTrust’s AI field detection runs through a multi-stage pipeline to produce templates that are consistent in field type, position, size, and key naming across documents. This guide explains the architecture, how to interpret the new response fields, and when to trust vs. review the output.

This applies to accounts on pipeline v2. Legacy (v1) responses do not include the pipeline metadata block or the extended preferences fields described below. Rollout is account-by-account; existing API consumers are unaffected by the additions.

Why this exists

The legacy detection path was a single LLM call that returned field types and positions as raw coordinates. Three recurring failure modes motivated v2:

Wrong sizes — the model occasionally returned w: 0, h: 0 and the backend silently applied a generic default, producing templates where every field was the same size regardless of the actual blank on the page.
Inconsistent keys — the same semantic field (e.g. a phone number) could come back as phone, tel, or contact_phone depending on the document, making downstream field mapping brittle.
No review loop — a single uncertain extraction was returned verbatim with no self-check, so low-confidence placements reached end signers.

V2 addresses each of these with dedicated stages.

Pipeline stages

PDF (decrypt if needed)
  → PdfPrep       (text-layer extraction via pdftotext -bbox-layout)
  → Detector      (Gemini 3 Flash with discriminator-based prompt + strict schema)
  → Normalizer    (enforce size bounds, snake_case keys, overlap dedup)
  → Anchor        (snap fields to PDF text-layer underscore runs / labels)
  → Critic        (confidence-gated self-review, second LLM pass — runs only when uncertain)
  → Normalizer    (re-apply invariants after Critic patches)
  → Telemetry     (structured log line + per-request diagnostic row)
  → Response

Each stage is independently testable. The pipeline is deterministic enough across runs (Gemini 3 at default temperature, with strict response schema) to make a test harness against hand-labeled ground truth practical.

Response shape

Every smart_setup and smart_create call on a v2 account returns a pipeline metadata block and richer preferences on each field.

{
  "document_summary": "Standard records release form.",
  "pages": [
    { "page_number": 1, "suggested_rotation": "0" }
  ],
  "fields": [
    {
      "uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "name": "Printed Name",
      "type": "text",
      "submitter_uuid": "s1a2b3c4-d5e6-7890-abcd-111111111111",
      "required": true,
      "readonly": false,
      "preferences": {
        "key": "full_name",
        "confidence": 0.98,
        "anchor_text": "Printed Name:",
        "anchor_method": "snapped_to_blank"
      },
      "reason": "Placed on underscore after 'Printed Name:' because it is the labeled text input.",
      "areas": [
        { "page": 1, "x": 28.56, "y": 19.58, "w": 33.63, "h": 3.32,
          "attachment_uuid": "doc-uuid-1111-2222-3333-444444444444" }
      ]
    }
  ],
  "field_count": 1,
  "pipeline": {
    "version": "v2",
    "request_id": "e2b7d6c0-9be9-49f7-9015-a1a8b0b2c3d4",
    "latency_ms": 4210,
    "anchor_hits": 1,
    "anchor_misses": 0,
    "corrections_applied": 0,
    "critic_ran": false
  }
}

Top-level `pipeline` metadata

Field	Purpose
`version`	`"v2"` on new pipeline; absent on legacy.
`request_id`	Stable UUID — correlates the response to DocuTrust’s internal diagnostic row for this run. Include in support tickets if an extraction looks wrong.
`latency_ms`	Total pipeline wall-time (includes Gemini calls + all stages).
`anchor_hits`	Number of fields whose positions were snapped to the PDF’s actual text layer (`snapped_to_blank` or `snapped_to_label`).
`anchor_misses`	Number of fields that fell back to pure-LLM placement (see `anchor_method: llm_only`).
`corrections_applied`	Total deterministic corrections from the Normalizer (size clamps, key dedup, overlap drops, etc.). A high number means the LLM output needed significant clean-up.
`critic_ran`	Whether the confidence-gated self-review pass fired on this request.

Per-field `preferences`

On v2, each detected field gets four additional preferences entries:

key — Canonical snake_case identifier (max 50 chars, regex ^[a-z][a-z0-9_]{0,49}$). Pulled from DocuTrust’s field library when the label semantically matches a known field; minted from the label otherwise. Same semantic field in the same document always gets the same key, and duplicates across documents are deduped with _2, _3 suffixes.
confidence — Float 0–1, self-reported model certainty on BOTH the field type AND its placement. See the calibration table below.
anchor_text — The verbatim label text near the field as it appears on the PDF. Useful for building audit trails (“we detected a signature because we saw the label Client Signature:”).
anchor_method — How the field’s position was finalized:
- snapped_to_blank — aligned to an underscore run in the text layer. Highest precision.
- snapped_to_label — aligned to the preceding label (no underscore run found nearby).
- llm_only — kept the raw model coordinates (text layer present but no anchor matched).
- no_text_layer — scanned PDF, no text layer available — expect lower positional accuracy.
- page_missing — the requested page wasn’t present in the document.

Confidence calibration

Range	What it means
0.9 – 1.0	Clear label AND a visible underscore / box / checkbox, verified placement on writable space. Safe to accept without review.
0.7 – 0.9	Clear label but ambiguous blank (short line, no underscore). Reasonable auto-accept for most workflows.
0.5 – 0.7	Inferred from layout; no explicit label or no explicit blank. Worth flagging for human review.
Below 0.5	Not returned. The model is instructed to omit these.

If any field in a response has confidence < 0.85, or the Normalizer applied corrections, or a field has anchor_method: llm_only, the Critic stage fires a second LLM pass to audit and correct. Look at the top-level critic_ran to see if this happened.

Canonical keys

To keep field names stable across documents, the Detector prompt injects DocuTrust’s canonical key library (signatures, common PII fields, consent checkboxes, role-prefixed variants, etc.) and instructs the model to prefer an existing key when the label semantically matches. Examples:

PDF label	Canonical key
”Printed Name” / “Full Name” / “Legal Name”	`full_name`
”Tel.” / “Telephone” / “Mobile”	`phone`
”Client Signature”	`signature` (or `client_signature` if multiple signatures exist in the same doc)
“Witness 2 Signature”	`witness_2_signature`
”Date of Birth” / “DOB”	`dob`
”I agree to the terms”	`checkbox_agree`

For labels with no canonical match, a new snake_case key is minted from the label and enforced through the ^[a-z][a-z0-9_]{0,49}$ regex.

Text-layer anchoring

For PDFs with a text layer (most PDFs generated from Word, DocuSign, HelloSign, etc.), the Anchor stage finds the exact pixel-accurate position of the underscore run or labeled blank in the document and replaces the LLM’s coordinates with those. This is the single biggest source of precision improvement in v2. Scanned PDFs without a text layer skip anchoring automatically (anchor_method: no_text_layer) — the LLM output is kept as-is, and expect roughly LLM-quality precision on those fields.

What consumers should do

New integrations — treat preferences.key as the stable identifier for cross-document field mapping. Do not rely on name (human-readable, may vary) or uuid (random, changes every detection).
Existing integrations — no changes required. The new response fields are additive; the shape of fields[i], areas[i], and the top-level response is preserved.
Error investigation — when an auto-mapped template looks wrong, send support the pipeline.request_id. It correlates to a diagnostic row with the raw model response, every correction applied, and the anchor method per field.
Confidence thresholds — if you auto-accept auto-mapped templates without human review, consider gating on minimum confidence (e.g. only accept templates where all fields have confidence >= 0.85).

Compatibility

Response fields are additive. No existing property was removed or changed in type.
Accounts not yet on v2 continue to receive the legacy response shape (fields[i].preferences = {}, no top-level pipeline block).
Both v1 and v2 paths share the same request format, HTTP status codes, and error payload shapes.

Documentation Index

​Why this exists

​Pipeline stages

​Response shape

​Top-level pipeline metadata

​Per-field preferences

​Confidence calibration

​Canonical keys

​Text-layer anchoring

​What consumers should do

​Compatibility

​Related

Why this exists

Pipeline stages

Response shape

Top-level `pipeline` metadata

Per-field `preferences`

Confidence calibration

Canonical keys

Text-layer anchoring

What consumers should do

Compatibility

Related