Security

Prompt Injection Protection for AI Email

Email is an open channel — anyone can send your agent a message. Molted Email scans every inbound email for injection patterns, quarantines high-risk content, and blocks data leakage with canary tokens.

How it works

1

Inbound email arrives

When a reply is received, the content is sanitized — invisible Unicode characters, data URIs, scripts, iframes, and hidden elements are stripped before storage.

2

Injection patterns detected

The content is scanned for prompt injection patterns across four categories: instruction overrides, role play attempts, system prompt mimicry, and delimiter abuse. Each match contributes to a risk score.

3

High-risk content quarantined

Messages with high injection risk are automatically quarantined — their body is replaced with a placeholder in the agent view. Operators can review quarantined content separately.

4

Thread anomalies flagged

The system detects forged thread injection (new sender not in thread history), intent flips from unknown senders, and rapid intent changes within short time windows.

5

Canary tokens block leakage

Every thread includes a canary token in the agent context. If an attacker tricks your agent into echoing the token in an outbound email, the send is blocked before delivery.

Configure protection

PUT /v1/me/safety-settings

// Configure safety settings
PUT /v1/me/safety-settings
{
  "quarantineHighInjection": true,
  "holdCriticalAnomalies": true,
  "blockCanaryViolations": true
}

Protection layers

Instruction Override Detection

Catches patterns like "ignore previous instructions", "disregard all previous", and "override your rules" embedded in email content.

Role Play Detection

Catches patterns like "you are now", "act as a", "pretend to be", and "new instructions:" that attempt to redefine agent behavior.

System Prompt Mimicry

Detects fake system prompts using markers like [INST], <<SYS>>, and other LLM control tokens embedded in email text.

Content Quarantine

High-risk messages are quarantined automatically. Agents see a placeholder; operators can review the full content with elevated permissions.

Canary Tokens

Deterministic integrity tokens per thread. If leaked in outbound content, the send is blocked — catching data exfiltration attempts.

Thread Anomaly Detection

Flags forged thread injections, intent flips from unknown senders, and rapid intent changes. Critical anomalies hold messages for review.

Risk scoring

Each inbound message gets a risk score based on matched injection categories. High-risk messages are quarantined automatically when protection is enabled. All settings are on by default — secure out of the box.

Risk Level    Score Range    Action
none          0              No action
low           > 0, < 0.3    Logged, content delivered
medium        >= 0.3, < 0.7 Flagged for review
high          >= 0.7        Quarantined automatically

Secure by default

All protection layers are enabled out of the box. Your agent is protected from injection attacks, data exfiltration, and thread manipulation from the first email.