A marketing agency principal needed one business number to do two jobs at once: let him move his own calendar by text, and let a prospective client ask about the work and book a consultation. We built that agent twice. First on Telegram, then re-homed onto WhatsApp through a channel we wrote ourselves. The same markdown brain ported across almost unchanged. What changed was the transport and the identity key, not the reasoning. We built it over five days at the end of May 2026 and presented it on 4 June 2026, where it handled a lead end to end live in the room, booked into a real calendar behind a gate that code refuses to open without permission, and produced the post-booking brief on the spot. The client stays withheld; this is filed under the Sealed tier for that reason.
Two builds, one brain
The agent is a Claude Code session. The running session is the entire brain: there is no separate hosted model call behind it, no orchestration tier deciding what to say. The markdown the session loads is the program.
One conductor file, around 120 to 150 lines, routes every inbound message in two steps. The first step is identity. Before the agent reads a single word, the channel has already resolved who is speaking into a role: the principal, or a lead. The agent never sees a raw phone number and never reasons about one. It is told it is talking to its principal or to a prospective client, and that fact selects which half of its instructions are live. The second step is intent. Within a role, the conductor routes the message to one small, single-purpose flow file: book a consultation, answer a question about services, read the calendar, propose a reschedule. Roughly seven flows per interface, each narrow enough to read in one sitting. Around the flows sit two more kinds of file. Knowledge lives in about five context files, read only when a flow needs them, so the agent quotes what the agency actually offers rather than improvising. Voice lives in about five files that fix tone, so the warmth of a first hello and the precision of a calendar confirmation are written down, not left to chance.
inbound message
│
▼
[ channel ] resolve identity → role: principal or lead
│ the agent never sees a phone number
▼
[ conductor ] route by intent
│
├─ lead · question → answer from the context files
├─ lead · booking → propose a slot → read-back → confirm
├─ principal · read → show the calendar
└─ principal · write → read-back → confirm
│
▼
[ gated calendar write ]
hook blocks any event title without [DEMO]
│
▼
post-booking brief to the principal
· tappable calendar link
· welcome draft, held for reviewThe shape matters because it is auditable. Identity routes to a role, a role routes to an intent, an intent loads one flow plus the knowledge and voice it needs. Every path through the agent is a short, named file a person can read in full. There is no large opaque prompt to trust. A brief you can hold in your head is a brief you can hand a CEO’s calendar.
We built the first version on Telegram, through the official Claude Code Telegram channel, keying identity on the Telegram chat id. Then we re-homed it onto WhatsApp’s official Business Cloud API, keying identity on the WhatsApp id instead. The migration is the interesting part: the conductor, the flows, the context, the voice, the read-back rules, all of it ported across almost untouched. The brain did not care what carried the message. We swapped the transport underneath it and re-pointed one routing key, and the reasoning came along whole. That portability is itself the evidence that the discipline lives in the agent’s design, not in any one messaging surface.
What carried it on WhatsApp is a small channel plugin we wrote ourselves, a Bun and TypeScript server in the same Channels craft we open-sourced as cc-dm. Inbound, a Meta webhook hits a public tunnel; the channel performs the verify handshake, checks the X-Hub-Signature-256 HMAC signature, passes an allowlist gate, and only then injects the message into the running session. Outbound, the agent calls a reply tool that POSTs to the Graph API. The security checks sit at the edge rather than in the model. Calendar and email reach the agent through user-scoped MCP connectors, Google Calendar and Gmail, the principal’s own, so the agent acts as the principal and never as a shared service account.
The moment it stops answering and starts acting
A chatbot answers. An agent takes the next real action and then knows where to stop. The signature moment here is the post-booking brief. The instant a lead confirms a consultation, the agent does not simply say “booked.” It proactively pushes the principal a tappable link to the new calendar event and a Gmail welcome message, pre-composed for that lead. The welcome is a draft. It is never auto-sent. The agent writes it, files it, and stops at the human-review line. The founder reads, edits if needed, sends. The agent has done the work right up to the edge of a decision a human should own, and then held.
That restraint is the whole posture. An agent that books a meeting is useful. An agent that drafts the follow-up and then waits for a person is trustworthy.
WHO HANDLES EACH STEP OF A BOOKING before after
answer the question principal → agent
propose a slot principal → agent
create the calendar event principal → agent (gated)
draft the follow-up principal → agent (held)
approve and send principal → principal ← stays human
PRINCIPAL TIME PER BOOKING (illustrative, not measured field data)
end to end, by hand ████████████ ~12 min
review only ██ ~2 minThe boring parts, enforced in code
The model can already hold a conversation and call a tool. None of that is what makes an agent safe to hand a CEO’s live calendar. The discipline around it is, and the discipline only counts when it is enforced in code rather than hoped for in a prompt.
Before the agent writes anything to the calendar, it reads the change back: it restates the date, the time, and who the slot is for, and waits for an explicit “yes.” Nothing mutates on an implied confirmation. Read-back-confirm is a single rule with an outsized effect, because the failure mode of a calendar agent is not refusing to book; it is booking the wrong thing silently.
Then there is the gate. During a demo, every event the agent creates must carry a [DEMO] prefix in its title, so a real calendar never fills with test bookings that look real. We did not trust the model to remember that. A PreToolUse hook inspects every calendar-create call and hard-blocks any event whose title lacks the prefix. If the model forgets, the tool call is refused at the boundary and it has to retry. That is the distinction the lab keeps returning to: a safety property the harness enforces, not an instruction the prompt hopes holds. Prompts drift across a long conversation; a hook does not.
The third discipline is a privacy invariant. When a proposed slot collides with a real, non-demo event already on the principal’s calendar, the agent says there is a conflict at that time and proposes another. It never reads back the conflicting event’s title. A lead asking for a slot must not learn, from a leaked event name, who else the principal is meeting that day. The agent can see the calendar; it is constrained in what it is allowed to surface from it.
Two smaller rules round out the posture. A time-context hook injects the real current time into the session every turn, so “tomorrow at three” resolves against the actual clock and never against something the model inferred from training data. And the agent speaks openly as an AI. It never adopts a fake human name to pass as the principal or a receptionist. Around nine such hard rules govern the agent, with twenty-three tests pinning the gate and the time scripts. The number that matters is not the count of rules but where they live: in code the model cannot talk its way past. A rule you can assert is a wish; a rule with a failing test behind it is a contract.
What it did in the room
We presented the WhatsApp version to the principal on 4 June 2026, where it was accepted. In the room it handled a lead end to end: answered questions about the work, proposed a slot, read the change back, booked into a real calendar behind the demo gate, and produced the post-booking brief live, the calendar link and the held welcome draft both landing for the principal as designed. We will not dress that up with a quote. It worked, in front of the people it was built for, on the first sitting.
What transfers
The frontier moved the hard part off the model. The model holds a conversation and calls tools out of the box, and that was never what we would put in front of an operator deciding whether to trust an agent. What makes an agent safe to hand a CEO’s calendar is the ring of disciplines around it: a gate enforced in code, a read-back before every write, a privacy invariant on conflicts, a real clock injected each turn, and a brief small enough to read end to end. The agent worked in the room because the boring parts were enforced, not asserted. That is evidence over declaration, applied to an agent rather than a website, and it is why we file the mechanism and withhold the client. We would rather show you a hook that refuses a bad write than tell you our prompt is careful.
If you have a workflow you would trust an agent with only if the guardrails were real, built small enough to read and gated hard enough to trust, that is the engagement we want. Start a conversation, and we will scope it.
References
- Meta. WhatsApp Business Cloud API. developers.facebook.com/docs/whatsapp/cloud-api
- Anthropic. Claude Code. claude.com/claude-code
- Anthropic. Model Context Protocol. modelcontextprotocol.io
- Orfloat. cc-dm: peer-to-peer messaging between Claude Code sessions. orfloat.com/research/cc-dm