Security
Prompt Injection Protection for AI Email
Email is an open channel — anyone can send your agent a message. Molted Email scans every inbound email for injection patterns, quarantines high-risk content, and blocks data leakage with canary tokens.
How it works
Inbound email arrives
When a reply is received, the content is sanitized — invisible Unicode characters, data URIs, scripts, iframes, and hidden elements are stripped before storage.
Injection patterns detected
The content is scanned for prompt injection patterns across four categories: instruction overrides, role play attempts, system prompt mimicry, and delimiter abuse. Each match contributes to a risk score.
High-risk content quarantined
Messages with high injection risk are automatically quarantined — their body is replaced with a placeholder in the agent view. Operators can review quarantined content separately.
Thread anomalies flagged
The system detects forged thread injection (new sender not in thread history), intent flips from unknown senders, and rapid intent changes within short time windows.
Canary tokens block leakage
Every thread includes a canary token in the agent context. If an attacker tricks your agent into echoing the token in an outbound email, the send is blocked before delivery.
Configure protection
PUT /v1/me/safety-settings
// Configure safety settings
PUT /v1/me/safety-settings
{
"quarantineHighInjection": true,
"holdCriticalAnomalies": true,
"blockCanaryViolations": true
}Protection layers
Instruction Override Detection
Catches patterns like "ignore previous instructions", "disregard all previous", and "override your rules" embedded in email content.
Role Play Detection
Catches patterns like "you are now", "act as a", "pretend to be", and "new instructions:" that attempt to redefine agent behavior.
System Prompt Mimicry
Detects fake system prompts using markers like [INST], <<SYS>>, and other LLM control tokens embedded in email text.
Content Quarantine
High-risk messages are quarantined automatically. Agents see a placeholder; operators can review the full content with elevated permissions.
Canary Tokens
Deterministic integrity tokens per thread. If leaked in outbound content, the send is blocked — catching data exfiltration attempts.
Thread Anomaly Detection
Flags forged thread injections, intent flips from unknown senders, and rapid intent changes. Critical anomalies hold messages for review.
Risk scoring
Each inbound message gets a risk score based on matched injection categories. High-risk messages are quarantined automatically when protection is enabled. All settings are on by default — secure out of the box.
Risk Level Score Range Action none 0 No action low > 0, < 0.3 Logged, content delivered medium >= 0.3, < 0.7 Flagged for review high >= 0.7 Quarantined automatically
Secure by default
All protection layers are enabled out of the box. Your agent is protected from injection attacks, data exfiltration, and thread manipulation from the first email.