How does document-level security work in CordonData?

ACL metadata is stored alongside vector embeddings. At query time, user identity is cross-referenced against this metadata, dynamically filtering the vector search space so users only see documents they have permission to access.

What document formats and languages does CordonData support?

CordonData supports PDF, DOCX, PPTX, XLSX, PNG, JPEG, TIFF, HTML, Markdown, plain text, and email formats. OCR supports 100+ languages including CJK, Arabic, Cyrillic, and Indic scripts.

FAQ

Frequently Asked Questions

Q: What makes CordonData different from other enterprise AI search tools?

CordonData combines on-premise RAG, document-level permission enforcement, automated PII/NHI/secret redaction, and advanced OCR in a single self-hosted package. Your data never leaves your infrastructure.

Everything you need to know about deployment, pricing, connectors, security, and AI capabilities.

Back to Home

For Technical Teams

What your architects
need to know

CordonData is designed to fit into existing enterprise infrastructure — not replace it. Here's the technical picture without the marketing language.

Deployment Docker Compose / Kubernetes

Single-node or multi-replica. All services containerised. No external SaaS dependencies required.

Authentication Keycloak 26 (OIDC / SAML)

Integrate with your existing IdP via SAML or OIDC. No separate user database required.

Search Hybrid: semantic + keyword

Combined retrieval over dense vectors and BM25 with permission-aware filtering. No cloud search dependency.

LLM integration Bring your own model

Connect any OpenAI-compatible endpoint. Run local models (Ollama) or route to cloud providers. Model is configurable per role.

Storage S3-compatible object store

Files stored encrypted in an on-premise S3-compatible store. No mandatory cloud bucket.

Async pipeline Apache Kafka (KRaft)

Document ingestion, OCR, and indexing decoupled from the API layer. Resilient to spikes in upload volume.

What's included

Multi-tenant workspace management

Role-based capability system (not just RBAC)

Document version control with approval workflows

Records management (retention, legal hold, disposition)

Online editing (WOPI / Collabora) — no round-trip download

Public share links with password & expiry

AI comparison across document versions

Budget-aware RAG (context window auto-configured)

Visual agent/workflow builder (low-code)

Architecture documentation

We share detailed architecture diagrams, data-flow documentation, and threat models with Design Partners under NDA.

Frequently Asked Questions

Everything you need to know about CordonData's enterprise AI platform.

What makes CordonData different from other enterprise AI search tools?

CordonData is the only platform that combines on-premise RAG, document-level permission enforcement, automated PII/NHI/secret redaction, and advanced OCR in a single self-hosted package. Unlike cloud-only solutions, your data never leaves your infrastructure. Unlike simple RAG wrappers, we provide native connectors to your existing DMS, full audit traceability, and zero-trust retrieval routing.

Can CordonData run completely air-gapped?

Yes. CordonData is designed for air-gapped, offline deployments. You can run the entire stack — OCR, embedding, vector search, LLM inference, and SSO — entirely within your secure network with no external API calls. We support local LLM inference via Ollama and other self-hosted model runtimes.

How does document-level security work?

When documents are indexed, their ACL metadata (owner, group, permissions) is stored alongside the vector embeddings. At query time, the user's identity — authenticated via Keycloak or Active Directory — is cross-referenced against this metadata. The vector search space is dynamically filtered so users only see results from documents they have permission to access. This happens at the index level, making it impossible to bypass via prompt injection.

What document formats and languages do you support?

We support PDF (scanned and native), DOCX, PPTX, XLSX, PNG, JPEG, TIFF, HTML, Markdown, plain text, and email formats (EML/MSG). Our OCR engine supports 100+ languages including CJK, Arabic, Cyrillic, and Indic scripts. We also handle RTL (right-to-left) languages with proper text layer alignment.

How does the PII and secret detection work?

Before any document content enters the vector index or LLM context window, it passes through our compliance scanning pipeline. We use a combination of regex patterns, ML-based named entity recognition, and entropy-based secret detection to identify PII (SSN, email, phone, passport, etc.), NHI (medical records, health IDs), and secrets (API keys, tokens, connection strings). Detected spans can be automatically redacted or flagged for manual review based on your policy configuration.

Can I use my own LLM or embedding model?

Absolutely. CordonData is model-agnostic. You can use OpenAI, Azure OpenAI, Anthropic, local models via Ollama, or any OpenAI-compatible API. The embedding model, reranker, and chat model are all configurable per knowledge base. You maintain full control over which models process your data.

How do I get started?

Join our waitlist or apply for the Design Partner Program. Design partners get white-glove onboarding, direct access to our engineering team, and lifetime pricing lock. We're looking for forward-thinking enterprises to help us stress-test the platform before the stable 1.0 release.

Build With Us: The Design Partner Program

We are soon launching a stable 1.0 release. We are looking for 3 forward-thinking enterprises to help us stress-test our advanced document extraction and hybrid search indexing pipelines.

What to Expect (v0.8)

Early Access to Core Features: The foundational RAG engine is operational. You'll help us polish the UI and refine edge cases before the public launch.
Collaborative Feedback: Your insights are invaluable. We'll work closely with your team to optimize connector reliability and the overall user experience.
Safe Sandbox Deployment: To ensure zero risk to production data, we ask that you provide a dedicated test environment or mock dataset for our initial connection.

The Benefits

White-Glove Onboarding: Direct installation and identity provider setup by our founding engineering team.
Roadmap Influence: Your feature requests get bumped to the front of the dev queue.
Lifetime Pricing Lock: Design partners secure an exclusive, heavily discounted licensing rate in perpetuity.

Frequently Asked Questions

What your architectsneed to know

What's included

Frequently Asked Questions

What makes CordonData different from other enterprise AI search tools?

Can CordonData run completely air-gapped?

How does document-level security work?

What document formats and languages do you support?

How does the PII and secret detection work?

Can I use my own LLM or embedding model?

How do I get started?

Build With Us: The Design Partner Program

What to Expect (v0.8)

The Benefits

Ready to transform how your team works with documents?

Join the Waitlist

You're on the list!

What your architects
need to know