Complete Product Suite
Everything you need to manage, search, and understand your enterprise documents — powered by AI that stays within your infrastructure.
Your Documents, Supercharged
CordonData is a complete Document Management System — upload, organize, version, share, and annotate. Then layer on AI search, compliance scanning, and OCR to unlock everything inside your files.
Document Management
Upload up to 500MB per file. Organize with nested folders (50 levels, 100K+ files each). Full version history, granular sharing with VIEW/EDIT/DELETE permissions, and PDF annotation with permanent redaction.
AI-Powered Search
Ask questions in natural language across all your documents. Hybrid search combines semantic vectors with BM25 keywords. Every answer cites the exact source document and page.
PII, NHI & Secret Detection
Auto-scan every document for sensitive data before it enters the AI pipeline. Detect SSNs, emails, medical records, API keys, and credentials. Auto-redact or flag for review.
OCR & Document Intelligence
Extract searchable text from scanned PDFs, images, and mixed-content documents. 100+ languages, RTL support, layout-aware parsing for multi-column and table-heavy files.
Full Audit Trail
Every query, retrieval, LLM prompt, and response is logged. Export deterministic audit traces showing exactly which document chunk produced each sentence.
Permission-Safe Retrieval
DMS permissions are the single source of truth. When you share or revoke access, the AI search index updates instantly. Users only see documents they're authorized to access — impossible to bypass.
Connect External Sources
Already have documents elsewhere? Connect to SharePoint, Alfresco, S3, email, file servers, and REST APIs. Index in-place — no file duplication, no data migration.
Model-Agnostic LLM Gateway
Use any LLM — OpenAI, Azure, Anthropic, local Ollama models. Configure per-knowledge-base with automatic fallback across priority tiers.
Deploy Anywhere
On-premise, air-gapped, your own cloud (BYOC), or managed single-tenant SaaS. Docker Compose or Kubernetes. AES-256 encryption. Keycloak SSO.
How CordonData compares
Enterprise document management has two common failure modes: legacy DMS with no AI, and cloud AI with no control. CordonData solves both.
| Capability | Legacy DMS | Cloud AI Tool | CordonData |
|---|---|---|---|
| On-premise / air-gap deployment | |||
| AI search with grounded answers | |||
| Access enforced before AI retrieval | — | ||
| Automated PII / secret detection | Partial | ||
| Connect to existing SharePoint / Alfresco | Limited | ||
| Immutable per-user audit trail | Basic | Basic | |
| Use your own LLM (model-agnostic) | |||
| Version control + approval workflows | Basic | ||
| Document redaction (burn PII) |
Comparison is generalised. Results vary by vendor and deployment configuration.
Every document format
your teams use
Upload, version, index, and search across all major office and technical formats. OCR applied where needed.
OCR applied automatically to scanned PDFs and images. Vision model descriptions available for complex visual content.
Purpose-specific
AI model roles
Assign different models to different tasks. Use a large model for generation, a fast model for reranking, a vision model for images — all configurable with 3-tier failover.
Primary text generation — chat responses, document summaries, workflow decisions
Extended reasoning for complex queries. Outputs a visible reasoning trace before the final answer.
Converts document chunks to vectors. Changing this model triggers a full re-index.
Re-scores retrieval candidates by relevance after initial recall. Improves answer quality significantly.
Describes images and video frames for indexing. Used as an OCR alternative for complex visual layouts.
Condenser compresses long context before generation. Privacy filter adds AI-contextual PII detection on top of pattern matching.
All roles support 3-tier priority failover. Context window auto-detected per model.
Agent Builder & Admin Platform
Beyond search — CordonData includes a full agent-builder platform for creating custom AI assistants, configuring model pipelines, and managing enterprise knowledge at scale.
Custom AI Agents
Build purpose-specific AI agents with custom system prompts, tool configurations, and knowledge base assignments. Each agent can use different LLM models and retrieval strategies tailored to specific business functions.
Global Model Settings
Configure LLM, embedding, reranker, condenser, and vision models globally across all knowledge bases. Set priority tiers with automatic fallback — use OpenAI for primary, local Ollama models as backup.
Visual Workflow Editor
Design complex AI pipelines with a drag-and-drop workflow editor. Chain together data ingestion, text extraction, chunking, embedding, retrieval, and response generation nodes — no code required.
Knowledge Base Management
Create and manage multiple knowledge bases, each with independent data sources, chunking strategies, embedding models, and ACL policies. Monitor indexing status, document counts, and sync health from a unified dashboard.
Processing Pipeline Monitor
Real-time visibility into OCR, compliance scanning, chunking, embedding, and RAG indexing pipelines. Track per-document status, retry failed documents, and monitor throughput across all connected sources.
SSO & Identity Management
Integrated Keycloak SSO with support for Active Directory, LDAP, and OIDC/SAML identity providers. Role-based access control across admin console, chat interface, and API endpoints.
Everything you need to
manage documents at scale
CordonData's DMS is a full enterprise document workbench — not a file drop. Upload, version, annotate, share, route for approval, place on legal hold, and ask AI questions, all from one place.
File Management
Drag-and-drop, bulk upload, unlimited folder nesting, move & copy
Full version chain with comments; each version independently OCR/RAG indexed
Multi-value tags with autocomplete; save and replay search queries
Star documents and folders for quick access across sessions
Admin-defined metadata schemas applied per document type
Collaboration & Editing
Edit DOCX, XLSX, PPTX, ODF in-browser via Collabora — no download needed. Save creates a new version and re-indexes automatically.
Lock a document for offline editing; others see read-only status. Check in uploads a new version and releases the lock.
Threaded comments per document; visible inline in the document panel
Add highlights, shapes, and markup on PDFs; annotations are shared and persisted server-side
Watch any document or folder — receive real-time bell notifications and email alerts on changes
Sharing & Access Control
Per-file and per-folder access control synced from SSO or set manually
Password-protected links with expiry and per-IP rate limiting
Apply per-recipient PDF redaction overlays — the same document, different views per user
Submit documents for approval with role routing, ALL/ANY/QUORUM conditions, SLA escalation, delegation, and recall
Compare any two document versions: side-by-side, inline diff, visual diff, or AI-generated summary of key changes
Records & Lifecycle
Convert any document to an immutable record; content-frozen records cannot be modified or deleted
Admin-defined retention periods; automatic disposition (destruction or transfer) on expiry
Place legal holds that block disposition; propagates to ancestor folders; fully audited
Schedule and execute disposition workflows; admin-managed bulk disposition runs
Delegate records management capabilities to designated users — separate from admin
AI built into every document
Available from the document panel — no separate AI view neededAsk questions directly about any document — answers are grounded in that specific file
Compare two versions with AI-generated change summaries, risk flags, and action items
Every new version is OCR-processed and re-indexed for semantic search — automatically
PII, NHI, and secrets scanned on ingest; flagged documents quarantined from AI retrieval
Advanced OCR & Document Intelligence
CordonData extracts structured, searchable text from any document format — scanned PDFs, images, handwritten notes, and complex multi-column layouts — using state-of-the-art OCR and document understanding models.
Scanned PDF OCR
Convert image-based PDFs into fully searchable text. Supports multi-page documents, mixed content (text + images), and RTL languages including Arabic and Hebrew.
Image Text Extraction
Extract text from PNG, JPEG, TIFF, and other image formats. Handles low-resolution scans, skewed documents, and complex backgrounds with high accuracy.
Layout-Aware Parsing
Understands multi-column layouts, tables, headers, footnotes, and callout boxes. Preserves reading order and document structure for accurate chunking.
Multilingual OCR
Supports 100+ languages including CJK (Chinese, Japanese, Korean), Arabic, Cyrillic, and Indic scripts. Automatic language detection for mixed-language documents.