Back to Blog
Security

Three Walls Between Your Data and the Internet: Inside Lumina's Firewall Pipeline

Engineering Team Feb 02, 2026 9 min read
Three Walls Between Your Data and the Internet: Inside Lumina's Firewall Pipeline

THE SECURITY BRIEF

In AI-powered analytics, the prompt is the attack surface. Every time a user asks a question, they may inadvertently include sensitive data: a Social Security Number pasted from a spreadsheet, a credit card number in a transaction log, an API key in a config snippet. Lumina's Firewall Pipeline ensures that nothing sensitive ever reaches an AI model without explicit permission.

Most AI analytics tools have a blind spot. They are excellent at processing data, but they give zero thought to what is in the data flowing through them.

When an analyst pastes a customer list into a chatbot to ask "which customers are at risk of churning?", that list may contain names, email addresses, phone numbers, and account identifiers. The chatbot sends all of it to a cloud API. No inspection. No redaction. No audit trail.

For regulated industries, this is not a theoretical risk. It is a compliance violation.

Layer 1: Pattern Detection (DLP Scanner)

The first wall is fast, deterministic, and runs entirely in the browser. Before any prompt leaves your device, Lumina's Data Loss Prevention scanner sweeps it for known sensitive data patterns.

Credit Cards

Validated with the Luhn algorithm. Catches real card numbers, ignores random large numbers in your reports.

Social Security Numbers

Detects SSN formats regardless of how they are formatted: dashes, spaces, or contiguous digits.

API Keys & Secrets

AWS access keys, OpenAI tokens, and other cloud credentials are detected and blocked before they can be transmitted.

PII & PHI

Email addresses, phone numbers, and Protected Health Information patterns, all critical for HIPAA and GDPR compliance.

Each detection produces a finding with a confidence score and an exact location in the prompt. The system then applies one of three verdicts:

BLOCK: Stop entirely

REDACT: Mask the value

ALLOW: Proceed safely

Layer 2: Business Rules (Policy Engine)

Patterns are necessary but not sufficient. Your organization has business logic about what data can go where, and the Policy Engine enforces it.

This layer evaluates each prompt against a set of configurable rules:

Data Egress Policy

Is this agent allowed to send data to the cloud? To a specific endpoint? Or must everything stay local?

Trusted Destinations

Only send data to pre-approved endpoints. Block everything else by default.

Audit Capture Level

Hash-only (for existence proof), summary (for review), or verbatim (for full transcript logging). Choose the level that matches your regulatory requirement.

The Policy Engine is deterministic. The same input always produces the same output. There are no probabilistic judgments here. It either passes or it does not.

Layer 3: Semantic Understanding (AI Classifier)

The first two layers catch structured data. But what about unstructured sensitivity?

"The employee who filed the harassment complaint last Tuesday" contains no SSN, no credit card, no API key. But it is deeply sensitive information that should never be sent to a cloud AI model.

The AI Classifier understands meaning. It evaluates prompts for:

  • Strategic IP: unreleased features, M&A rumors, internal codenames
  • Credentials: passwords, tokens, and access keys described in natural language
  • Internal Operations: salary lists, server hostnames, organizational charts
  • Legal Risk: liability admissions, lawsuit details, settlement terms

And here is the part that matters for enterprise: the AI Classifier can run in two modes.

Cloud Mode

Uses a cloud language model for fast, powerful classification. Best for public and internal data where cloud processing is acceptable.

Local Mode

Runs on your own infrastructure via Ollama or any OpenAI-compatible endpoint. Zero data egress. Ideal for healthcare, defense, and financial environments.

Defense in Depth: Why Three Layers Matter

No single layer is perfect. Regex misses paraphrased data. AI classifiers can hallucinate. Business rules cannot understand context. But together, they form a defense-in-depth that catches what each individual layer would miss.

What Each Layer Catches

DLP Scanner"My SSN is 123-45-6789". Exact pattern match, blocked instantly
Policy Engine"Send this to external-api.com". Destination not in trusted list, blocked
AI Classifier"The employee from the incident last week". Contextual PII, no pattern to match

For organizations in healthcare, finance, defense, and other regulated sectors, this is not optional security theater. It is the minimum viable trust architecture for deploying AI analytics against sensitive data.

And because the entire pipeline is opt-in per agent, configured right inside Agent Studio, you get granular control without blanket overhead. A public marketing agent can skip the firewall entirely, while a clinical research agent gets the full pipeline.

Configure Your AI Firewall

Open Agent Studio, enable the three-layer pipeline, and see it block sensitive data in real time on your own dataset.