How AI Agents Send Email Today (and Why It's Broken)

2026-04-10

The question "how does your AI agent send email?" has four common answers right now. I have seen all of them in production. They share a surface resemblance - they all deliver messages - but each has a different failure mode that becomes obvious only after something goes wrong.

Here is what each pattern looks like in code, and why each one is missing something critical.

Pattern 1: The raw SMTP tool

The simplest pattern. Define a tool that calls SMTP directly and hand it to the model.

def send_email(to: str, subject: str, body: str) -> str:
    msg = MIMEText(body)
    msg["Subject"] = subject
    msg["From"] = "agent@yourcompany.com"
    msg["To"] = to
    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login(SMTP_USER, SMTP_PASSWORD)
        server.send_message(msg)
    return f"Email sent to {to}"

tools = [send_email]
agent = Agent(model="claude-3-5-sonnet", tools=tools)

This is what almost every tutorial shows. It works for demos. In production, it breaks in four ways:

No deduplication. If the agent runs twice - because of a bug, a retry, a duplicate event from your queue - both instances send. The recipient gets the same email twice. You have no way to detect or prevent this.

No suppression. That user who unsubscribed last month? The one who marked your email as spam? The one who hard-bounced? Your SMTP tool has no idea. It sends anyway. Each one is a compliance violation or a reputation hit.

No rate control. A loop that goes wrong sends as fast as your SMTP server accepts connections. On a fresh domain, that is enough to get you blocklisted before the end of the hour.

No record of what happened. When your CEO asks "what did the agent send to that enterprise prospect yesterday?" there is no answer. The message went out. That is all you know.

Pattern 2: The transport API wrapper

A step up: use a transactional email API instead of raw SMTP. Resend, Postmark, and similar services give you an HTTP endpoint, basic tracking, and somewhat better deliverability infrastructure.

import resend

def send_email(to: str, subject: str, body: str) -> dict:
    params = {
        "from": "Agent <agent@yourcompany.com>",
        "to": [to],
        "subject": subject,
        "html": body,
    }
    return resend.Emails.send(params)

This is better. You get delivery status, open tracking, and the deliverability infrastructure that comes with using a reputable sending service. But the same gaps remain, just papered over slightly:

Still no deduplication. Resend will happily send the same message twice if called twice. You have to build idempotency yourself. Most teams do not.

Still no suppression awareness. The transport API manages its own suppression list for hard bounces, but it does not know about your application-level opt-outs, disengaged contacts, or the prospect who told your sales rep last week to stop all outreach. Your agent does not know either.

Still no rate governance. Resend has rate limits on their end, but they are generous by design. A runaway agent will exhaust your sending reputation long before it hits their API limits.

Still no inbound. Your agent sends. The recipient replies. The reply goes... somewhere. Maybe a mailbox nobody monitors. Maybe a webhook you wrote three months ago that stopped working. The agent that sent the message has no structured way to receive, classify, and act on the response.

Transport APIs are the right infrastructure layer for developers building transactional notification systems. They are not designed for agents that need to govern their own sending behavior, and they cannot be made to do that job without significant custom code around them.

Pattern 3: The home-built policy layer

Some teams recognize the gaps and build their own enforcement layer on top of a transport API. A Redis counter for rate limiting. A database table for suppression lists. A deduplication check before each send.

async def governed_send(to: str, subject: str, body: str, dedupe_key: str) -> dict:
    # Check suppression
    if await db.is_suppressed(to):
        return {"status": "blocked", "reason": "suppressed"}

    # Deduplication
    if await redis.exists(f"sent:{dedupe_key}"):
        return {"status": "blocked", "reason": "duplicate"}

    # Rate limit check
    hourly_count = await redis.incr(f"rate:{hour_key}")
    if hourly_count > HOURLY_LIMIT:
        return {"status": "blocked", "reason": "rate_limited"}

    # Send
    result = await resend.send(to, subject, body)
    await redis.setex(f"sent:{dedupe_key}", 86400, "1")
    return result

This is the most well-intentioned pattern and the one with the most insidious failure modes.

The code above has at least three bugs:

The rate limit check and the Redis increment are not atomic. Under concurrent load, two agents can both read a count of 99, both increment to 100, and both decide the limit has not been exceeded yet.
The deduplication check and the send are not atomic. If the process crashes after sending but before writing the deduplication key, the next run sends again.
There is no inbound handling. Replies still go nowhere structured.

And these are just the bugs in the code snippet. The harder problem is organizational: every new agent you deploy has to know about this function and be wired up to it. An agent someone spins up for a quick experiment, or a new hire who does not know about the shared rate-limiting layer, bypasses all of this completely. The policy is in application code, which means it is only as good as every developer's awareness of it.

There is also everything this layer does not cover: warmup limits, per-recipient-domain throttling, risk budget tracking, autonomy levels for human-in-the-loop approval, prompt injection scanning on inbound replies. Building all of that from scratch is months of work that has nothing to do with your product.

Pattern 4: The LLM-native tool call

The most recent pattern: use a framework that abstracts the send behind a semantic tool definition. The model reasons about email as a capability, and the framework handles the underlying mechanics.

from langchain.agents import tool

@tool
def send_followup_email(
    recipient: str,
    context: str,
) -> str:
    """Send a follow-up email to a prospect based on conversation context."""
    subject, body = generate_email_content(context)
    return send_via_transport_api(recipient, subject, body)

The framing is appealing. The model focuses on intent. The implementation details are abstracted. But this pattern inherits all the problems of whatever send_via_transport_api does - which brings you back to one of the previous three patterns.

The abstraction also introduces a new problem: the model decides who to email and when, without any mechanism for the infrastructure to disagree. If the model reasons that it should follow up with 50 prospects this hour, nothing in the tool definition stops it. The tool call succeeds or fails based on the underlying API, not based on whether sending 50 emails in an hour is appropriate for your domain warmup stage or your compliance posture.

What the infrastructure actually needs to handle

The common thread across all four patterns is that they treat email as a simple output action - something you do - rather than as a governed channel with its own state, rules, and feedback loops.

What email-capable agents actually need at the infrastructure layer:

Deterministic policy evaluation. Every send request evaluated against a consistent set of rules: deduplication, cooldown windows, suppression lists, rate limits across multiple time windows, risk budgets, autonomy level controls. This evaluation needs to happen at a layer the agent cannot bypass and cannot reason its way around.

Structured inbound. Replies classified and routed back to the agent with a recommended next action. The outbound send and the inbound reply are the same conversation - they should be linked at the infrastructure layer, not stitched together in application code.

An immutable record. Every send decision recorded with which rules were evaluated and why each was approved or rejected. Not for debugging - for accountability. Email creates commitments under real identities, and you need to be able to explain every message after the fact.

Failure as a first-class response. A blocked send is not an error. It is a decision. The agent needs a structured response that tells it why the send was blocked and what it should do next - not a 500 or a silent drop.

The policy engine that handles these requirements is the same separation of concerns you already use for database connection pooling or authentication. It belongs at the infrastructure layer because that is where it cannot be bypassed.

If your agents are currently on any of the four patterns above, Molted gives them a managed mailbox with all of this built in. The switch is straightforward - replace the send call, get policy enforcement, inbound handling, and a decision trace for every message.

Keep reading

Why AI Agents Need Email Guardrails - the failure modes in depth
The Policy Rules That Protect Your Sender Reputation - every rule evaluated on each send
Building an Email Pipeline for AI Agents - the end-to-end architecture
What Is Agent-Native Email? — the category this infrastructure belongs to