Index

Bringing Manus AI to iMessage

I kept noticing the same thing: people don't want another app. They want the tools they already have to be smarter. iMessage is where people already are — it's not something you install, it's just there. So the idea was straightforward: let people access Manus AI directly through iMessage. Text a request, get an answer back. Manus can browse the web, write code, create files, do research — and now all of that is one text away. No downloads, no new habits.

I shipped this one fast. The concept-to-working-demo part was quick. The making-it-actually-reliable part was not. This post covers the production problems I hit and how I fixed them.


The Database Problem

Two weeks into production, the Docker Compose environment started failing in cascading ways. The worker service would crash, restart, crash again. Postgres would disappear. Connections would reset. Every 10 seconds, another wave of failures. Users couldn't rely on us.

I spent three days in logs before I found it. Postgres and Redis had no restart policy. Backend and worker had restart: unless-stopped. So when Postgres received a shutdown signal — network hiccup, maintenance, anything — it stayed dead permanently. Backend and worker kept restarting, crashing immediately because the database was unreachable, creating an infinite loop.

The fix was one line per service:

postgres:
  restart: unless-stopped
 
redis:
  restart: unless-stopped

But this exposed a deeper issue: the worker was starting before migrations were guaranteed to be complete. It only checked if Postgres was "healthy" (running), not if the schema was ready. I made the worker depend on the backend being healthy (which runs migrations on startup), and changed the worker entrypoint to fail hard if migrations aren't ready instead of silently starting in a broken state:

if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
  echo "Database migrations not complete after $MAX_RETRIES attempts. Exiting."
  exit 1
fi

Docker restarts it cleanly. The system self-heals instead of running broken. Failing fast is better than running in a state you can't trust.


Follow-Up Detection

A beta user reported that follow-up questions were being treated as new tasks. They'd ask something, get an answer, ask a follow-up — and Manus would start fresh, asking for clarification instead of continuing the conversation. Like talking to someone with amnesia.

This was a fundamental problem. The entire value of having Manus in iMessage is that you can have a natural back-and-forth. If it forgets what you just talked about, you're not having a conversation — you're having repetitive interactions.

The issue was in how I gathered context for the classifier. When a task completed and the response was sent, I cleared the currentTaskId. When the next message arrived, the system checked for active task context and found nothing:

if (!connection?.currentTaskId || !connection?.currentTaskStartedAt) {
  return [];  // Empty context = can't detect follow-ups
}

No context meant the classifier couldn't tell a follow-up from a new request. I introduced a 5-minute sliding window that preserves recent conversation history even after tasks complete:

if (connection?.currentTaskId && connection?.currentTaskStartedAt) {
  taskStartTime = connection.currentTaskStartedAt.getTime();
} else if (connection) {
  taskStartTime = Date.now() - (5 * 60 * 1000);
}

Five minutes because the typical completion-to-follow-up gap is 1-10 seconds, but I wanted buffer for delivery delays without letting old messages pollute context.

There was a second issue: system messages were contaminating context. Including both user messages and bot responses made the context noisy for the classifier. Filtering to only user messages (isFromMe=false) cleaned it up immediately.

Follow-up detection went from roughly 60% to 95%+.


The Classifier

The message classifier determines whether an incoming text is a new task, a follow-up, a general question, or a revoke command. I started with Google Gemini 2.0 Flash — fast and cheap but not great at nuanced classification.

The real problem wasn't the model though. It was the prompt. I'd written it example-heavy:

**FOLLOW_UP** examples:
- "Yes, do that"
- "No, try again"  
- "ok"
- "sure"

Examples don't scale. There are infinite possible messages. I rewrote the prompt to be principle-based:

**CORE PRINCIPLE:**
- NEW_TASK: User introduces a NEW problem/request not in context
- FOLLOW_UP: User continues discussing the SAME topic in context
- Key question: "Is the user talking about the same thing, or something different?"
 
**DECISION FRAMEWORK:**
Step 1: Is message exactly "revoke"?REVOKE
Step 2: Is this about the service itself?GENERAL_QUESTION  
Step 3: Does context exist?
  - No → NEW_TASK
  - Yes → "Same topic as context?"FOLLOW_UP or NEW_TASK

A logical decision tree instead of enumerated examples. Handles edge cases naturally because it's reasoning from first principles rather than pattern matching.

Switched the model to Claude 3.5 Sonnet, then later to Claude 4.5 Sonnet as it became available. Better reasoning model plus principle-based prompt got classification accuracy to 98%+.


Admin Commands

Our CTO tried to use the reset command and got the free-tier API key prompt. Admin commands were being checked after the free tier limit, so they'd get blocked before reaching the admin handler.

The fix was reordering the checks — admin commands first, then tier limits:

// Admin commands first
if (messageText) {
  const adminHandled = await handleAdminCommand(phoneNumber, messageText);
  if (adminHandled) return;
}
 
// Then tier limits
if (freeTrierExhausted) {
  sendMessage("Please add API key...");
  return;
}

When you have multiple conditional guards, order them from most privileged to least. Small bug, but it revealed a pattern I now use everywhere: check the most specific case before the generic one.


What Changed

The database fix turned a system that would cascade into failure on any Postgres hiccup into one that self-heals. The context window fix turned a chatbot that forgot conversations into one that handles natural back-and-forth. The classifier rewrite turned inconsistent message routing into something that gets it right nearly every time.

None of these were features. They were the difference between a demo and something people can actually rely on. When your infrastructure is unstable, that's not a technical problem — that's a user problem. When follow-ups don't work, that's not a classification problem — that's the core value proposition being broken. It's all the same thing: building something people can trust.

The potential here is what keeps me excited. Manus can do a lot — research, code, files, analysis — and all of that is now accessible from the app people check 150 times a day. No context switching, no new tools to learn. Just text and get things done.


Manus on iMessage is live at github.com/photon-hq/manus. BullMQ worker, Redis pub/sub, Prisma ORM, HTTP MCP endpoint.