Frequently Asked Questions
Everything you need to know about deployment, pricing, connectors, security, and AI capabilities.
What your architects
need to know
CordonData is designed to fit into existing enterprise infrastructure — not replace it. Here's the technical picture without the marketing language.
Single-node or multi-replica. All services containerised. No external SaaS dependencies required.
Integrate with your existing IdP via SAML or OIDC. No separate user database required.
Combined retrieval over dense vectors and BM25 with permission-aware filtering. No cloud search dependency.
Connect any OpenAI-compatible endpoint. Run local models (Ollama) or route to cloud providers. Model is configurable per role.
Files stored encrypted in an on-premise S3-compatible store. No mandatory cloud bucket.
Document ingestion, OCR, and indexing decoupled from the API layer. Resilient to spikes in upload volume.
What's included
We share detailed architecture diagrams, data-flow documentation, and threat models with Design Partners under NDA.
Frequently Asked Questions
Everything you need to know about CordonData's enterprise AI platform.
What makes CordonData different from other enterprise AI search tools?
CordonData is the only platform that combines on-premise RAG, document-level permission enforcement, automated PII/NHI/secret redaction, and advanced OCR in a single self-hosted package. Unlike cloud-only solutions, your data never leaves your infrastructure. Unlike simple RAG wrappers, we provide native connectors to your existing DMS, full audit traceability, and zero-trust retrieval routing.
Can CordonData run completely air-gapped?
Yes. CordonData is designed for air-gapped, offline deployments. You can run the entire stack — OCR, embedding, vector search, LLM inference, and SSO — entirely within your secure network with no external API calls. We support local LLM inference via Ollama and other self-hosted model runtimes.
How does document-level security work?
When documents are indexed, their ACL metadata (owner, group, permissions) is stored alongside the vector embeddings. At query time, the user's identity — authenticated via Keycloak or Active Directory — is cross-referenced against this metadata. The vector search space is dynamically filtered so users only see results from documents they have permission to access. This happens at the index level, making it impossible to bypass via prompt injection.
What document formats and languages do you support?
We support PDF (scanned and native), DOCX, PPTX, XLSX, PNG, JPEG, TIFF, HTML, Markdown, plain text, and email formats (EML/MSG). Our OCR engine supports 100+ languages including CJK, Arabic, Cyrillic, and Indic scripts. We also handle RTL (right-to-left) languages with proper text layer alignment.
How does the PII and secret detection work?
Before any document content enters the vector index or LLM context window, it passes through our compliance scanning pipeline. We use a combination of regex patterns, ML-based named entity recognition, and entropy-based secret detection to identify PII (SSN, email, phone, passport, etc.), NHI (medical records, health IDs), and secrets (API keys, tokens, connection strings). Detected spans can be automatically redacted or flagged for manual review based on your policy configuration.
Can I use my own LLM or embedding model?
Absolutely. CordonData is model-agnostic. You can use OpenAI, Azure OpenAI, Anthropic, local models via Ollama, or any OpenAI-compatible API. The embedding model, reranker, and chat model are all configurable per knowledge base. You maintain full control over which models process your data.
How do I get started?
Join our waitlist or apply for the Design Partner Program. Design partners get white-glove onboarding, direct access to our engineering team, and lifetime pricing lock. We're looking for forward-thinking enterprises to help us stress-test the platform before the stable 1.0 release.
Build With Us: The Design Partner Program
We are soon launching a stable 1.0 release. We are looking for 3 forward-thinking enterprises to help us stress-test our advanced document extraction and hybrid search indexing pipelines.
What to Expect (v0.8)
- Early Access to Core Features: The foundational RAG engine is operational. You'll help us polish the UI and refine edge cases before the public launch.
- Collaborative Feedback: Your insights are invaluable. We'll work closely with your team to optimize connector reliability and the overall user experience.
- Safe Sandbox Deployment: To ensure zero risk to production data, we ask that you provide a dedicated test environment or mock dataset for our initial connection.
The Benefits
- White-Glove Onboarding: Direct installation and identity provider setup by our founding engineering team.
- Roadmap Influence: Your feature requests get bumped to the front of the dev queue.
- Lifetime Pricing Lock: Design partners secure an exclusive, heavily discounted licensing rate in perpetuity.