In AI-powered analytics, the prompt is the attack surface. Every time a user asks a question, they may inadvertently include sensitive data: a Social Security Number pasted from a spreadsheet, a credit card number in a transaction log, an API key in a config snippet. Lumina's Firewall Pipeline ensures that nothing sensitive ever reaches an AI model without explicit permission.
Most AI analytics tools have a blind spot. They are excellent at processing data, but they give zero thought to what is in the data flowing through them.
When an analyst pastes a customer list into a chatbot to ask "which customers are at risk of churning?", that list may contain names, email addresses, phone numbers, and account identifiers. The chatbot sends all of it to a cloud API. No inspection. No redaction. No audit trail.
For regulated industries, this is not a theoretical risk. It is a compliance violation.
The first wall is fast, deterministic, and runs entirely in the browser. Before any prompt leaves your device, Lumina's Data Loss Prevention scanner sweeps it for known sensitive data patterns.
Validated with the Luhn algorithm. Catches real card numbers, ignores random large numbers in your reports.
Detects SSN formats regardless of how they are formatted: dashes, spaces, or contiguous digits.
AWS access keys, OpenAI tokens, and other cloud credentials are detected and blocked before they can be transmitted.
Email addresses, phone numbers, and Protected Health Information patterns, all critical for HIPAA and GDPR compliance.
Each detection produces a finding with a confidence score and an exact location in the prompt. The system then applies one of three verdicts:
BLOCK: Stop entirely
REDACT: Mask the value
ALLOW: Proceed safely
Patterns are necessary but not sufficient. Your organization has business logic about what data can go where, and the Policy Engine enforces it.
This layer evaluates each prompt against a set of configurable rules:
Is this agent allowed to send data to the cloud? To a specific endpoint? Or must everything stay local?
Only send data to pre-approved endpoints. Block everything else by default.
Hash-only (for existence proof), summary (for review), or verbatim (for full transcript logging). Choose the level that matches your regulatory requirement.
The Policy Engine is deterministic. The same input always produces the same output. There are no probabilistic judgments here. It either passes or it does not.
The first two layers catch structured data. But what about unstructured sensitivity?
"The employee who filed the harassment complaint last Tuesday" contains no SSN, no credit card, no API key. But it is deeply sensitive information that should never be sent to a cloud AI model.
The AI Classifier understands meaning. It evaluates prompts for:
And here is the part that matters for enterprise: the AI Classifier can run in two modes.
Uses a cloud language model for fast, powerful classification. Best for public and internal data where cloud processing is acceptable.
Runs on your own infrastructure via Ollama or any OpenAI-compatible endpoint. Zero data egress. Ideal for healthcare, defense, and financial environments.
No single layer is perfect. Regex misses paraphrased data. AI classifiers can hallucinate. Business rules cannot understand context. But together, they form a defense-in-depth that catches what each individual layer would miss.
For organizations in healthcare, finance, defense, and other regulated sectors, this is not optional security theater. It is the minimum viable trust architecture for deploying AI analytics against sensitive data.
And because the entire pipeline is opt-in per agent, configured right inside Agent Studio, you get granular control without blanket overhead. A public marketing agent can skip the firewall entirely, while a clinical research agent gets the full pipeline.
Open Agent Studio, enable the three-layer pipeline, and see it block sensitive data in real time on your own dataset.