Public legal
AI and data processing
AI/OCR provider flows and data boundaries for DossierCFO.
AI and data processing
The AI flow separates raw-file OCR, redacted text, and deterministic calculations.
DossierCFO output is not accounting, tax, legal, or advisory advice. When OCR is
needed, raw files may be transferred to the configured OCR provider after upload
scan checks pass.
Providers and data
| Flow | Data sent | Boundary |
| ------------------------------------ | --------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| Browser extraction | Local file selected by the user | Pre-auth stays in the browser. |
| OCR-assisted extraction | Scanned PDFs, JPG, or PNG after sign-in and clean scan | The raw file may be sent to the OCR provider. |
| AI analysis / text-analysis provider | Extracted and redacted text | The text-analysis path must not receive raw PII. |
| Dossier Agent proposals | Sanitized case metadata, source refs, redacted facts, evidence/report state | AI drafts workflow proposals; accepted state remains advisor-confirmed. |
| KPI, score, export | Normalized facts and deterministic formulas | AI does not decide formulas, evidence state, or export readiness. |
Production AI/OCR traffic routes through OpenRouter using the configured OPENROUTER_API_KEY. Redacted text analysis and Dossier Agent proposals default to deepseek/deepseek-v4-pro and pin the OpenRouter provider route to DeepSeek. Protected image OCR defaults to z-ai/glm-5v-turbo and pins the OpenRouter provider route to Z.ai. Scanned PDFs use OpenRouter file parsing before the OCR-assisted result remains validation-bound.
Dossier Agent proposals
The Dossier Agent can draft intake, review, evidence, client-request, report-section, and next-step proposals for an account with AI access enabled. These proposals are stored with model route, source refs, risk level, status, and run metadata. They do not become accepted case state until an advisor accepts them, and low-risk safe-apply only changes proposal status.
Prompt injection
Instructions found inside PDFs, DOCX, XLSX, CSV, XML, ZIP files, or evidence uploads are untrusted source content. They cannot modify formulas, KPIs, scores, report validation, evidence lifecycle, waivers, external sends, deletion, or export state.
Review state
OCR output and complex files feed dossier outputs only when source-span, evidence, completeness, confidence, and review checks are sufficient.
OCR limits and failures
Browser PDF reading is capped at 50 pages. Scanned PDF OCR is capped at 50 MB,
and JPG/PNG OCR evidence is capped at 10 MB. Sparse PDF pages are prioritized
for OCR in the authenticated workflow. If OCR is bypassed, unavailable, or fails,
the document remains validation-bound and must continue through manual/advisor
review instead of supporting trusted KPI output.