Unicode Shield
A message that looks completely ordinary in the bubble can still carry hidden Unicode that a model will happily read as input. The human sees one sentence. The agent sees another. That gap is the whole bug.
The split
Most messaging stacks were built for humans. The working assumption is that the important part of a message is the part a human can see. That assumption breaks the moment an LLM becomes the real consumer of the string.
A human sees a normal sentence. A dashboard shows a normal sentence. A log line stores a normal sentence. But the model doesn't read rendered bubbles — it reads the raw string, invisible characters included.
| Perspective | What appears to be in the message | What actually happens |
|---|---|---|
| Human sender / operator | A completely normal message | Nothing looks suspicious |
| Product logs and admin tools | The same visible sentence | Hidden characters are usually missed |
| Agent runtime | Raw Unicode text | The hidden payload is still there |
| Model context | The full raw string | Invisible instructions are processed as input |
That's the entire attack in one table. If invisible characters reach the prompt, the failure has already happened.
A harmless demo
Here are two lines that look identical to the first one.
Example A — zero-width characters embedded in a normal message
What's the weather today?Example B — invisible tag characters appended to a normal message
What's the weather today?Paste either one into a character viewer like this one and you'll see the actual bytes. Example A carries U+200B, U+2060, U+FEFF. Example B encodes the benign payload demo in Unicode tag characters.
Neither one is malicious. They just prove the channel exists.
Why it matters for agents
For human-to-human chat, invisible Unicode is a weird edge case. For an agent, it's an input-integrity problem.
LLMs don't distinguish visible text from invisible text the way humans do. They receive tokens from the raw string. If the hidden portion contains instructions, context poisoning, or side-channel data, the model treats it as legitimate user input.
| Stage | What the developer thinks is happening | What can actually be happening |
|---|---|---|
| Message received | User asked a normal question | Raw string includes hidden extra content |
| Message passed to the agent | Agent is processing visible user intent | Agent is also receiving invisible payloads |
| Message stored in memory | Conversation history is clean | Hidden payloads may persist across turns |
| Agent acts | Model is following the user's message | Model may be reacting to content nobody saw |
This isn't just single-turn prompt injection. If raw text is stored in memory and replayed, it becomes session poisoning.
Why most developers will miss it
The workflow that causes this is the obvious one: receive a message, read message.text, pass it to the model, move on. It feels reasonable because the message looks normal everywhere the developer checks.
Rendering is not validation. And most people simply don't know this class of attack exists yet.
That's why telling every developer to "sanitize your inputs" doesn't work. Some won't know they need to. Others will sanitize only before inference but not before logging, memory, or downstream automation. The complexity has to be absorbed somewhere closer to the wire.
Bigger than prompt injection
Invisible Unicode in messaging isn't just a prompt-injection problem. It's a hidden-channel problem. Once a system lets invisible payloads pass through untouched, it opens the door to:
| Risk class | What it means in practice |
|---|---|
| Covert exfiltration | Stolen data embedded invisibly in innocent-looking outbound messages |
| Hidden metadata channels | Machine-readable side-channel data alongside human-readable chat |
| Watermarking / fingerprinting | Invisibly marking outbound messages for tracing and provenance |
| Session poisoning | Hidden payloads persisting in agent memory across turns |
Same root cause: raw Unicode moving through the stack without inspection.
What a secure pipeline looks like
Treat invisible Unicode like any other untrusted input: normalize it, inspect it, preserve evidence, and control exposure.
| Pipeline step | Secure behavior |
|---|---|
| Inbound message receipt | Capture the raw string exactly as received |
| Normalization | Remove or isolate invisible Unicode from model-bound text |
| Inspection | Detect suspicious patterns, length anomalies, encoded payloads |
| Policy enforcement | Block, warn, quarantine, or log based on score and settings |
| Downstream delivery | Pass only the safe text into the agent and model |
| Audit trail | Retain the raw original for forensic review |
This is what unicode-shield does — a zero-dependency TypeScript normalization layer that strips invisible characters, bidi attacks, Zalgo, homoglyphs, and 400+ dangerous codepoints before any text reaches a model. Sanitized safe text for the LLM, raw original preserved for audit, suspicious patterns scored so you can block, warn, or log.
Takeaways
Messaging is turning into a real interface for AI. That means the attack surfaces of messaging are becoming the attack surfaces of models.
The old assumption — "if a message looks safe to a human, it's safe enough" — doesn't survive contact with agents. A trusted interface, plus invisible payloads, plus raw model ingestion is exactly the kind of architectural gap that produces silent failures in production. Not flashy ones. The dangerous kind: the model behaves strangely, nobody can see why, the logs look normal.
If you're building agents on top of messaging, don't assume the visible text is the whole message. Treat the raw string as hostile input. Sanitize before inference. Inspect before memory. Preserve the original for audit. And if you can, solve it at the infrastructure layer instead of asking every developer to rediscover it one production incident at a time.