Index

Unicode Shield

A message that looks completely ordinary in the bubble can still carry hidden Unicode that a model will happily read as input. The human sees one sentence. The agent sees another. That gap is the whole bug.


The split

Most messaging stacks were built for humans. The working assumption is that the important part of a message is the part a human can see. That assumption breaks the moment an LLM becomes the real consumer of the string.

A human sees a normal sentence. A dashboard shows a normal sentence. A log line stores a normal sentence. But the model doesn't read rendered bubbles — it reads the raw string, invisible characters included.

PerspectiveWhat appears to be in the messageWhat actually happens
Human sender / operatorA completely normal messageNothing looks suspicious
Product logs and admin toolsThe same visible sentenceHidden characters are usually missed
Agent runtimeRaw Unicode textThe hidden payload is still there
Model contextThe full raw stringInvisible instructions are processed as input

That's the entire attack in one table. If invisible characters reach the prompt, the failure has already happened.


A harmless demo

Here are two lines that look identical to the first one.

Example A — zero-width characters embedded in a normal message

What's the weather today?​⁠

Example B — invisible tag characters appended to a normal message

What's the weather today?󠀤󠁤󠁥󠁭󠁯

Paste either one into a character viewer like this one and you'll see the actual bytes. Example A carries U+200B, U+2060, U+FEFF. Example B encodes the benign payload demo in Unicode tag characters.

Neither one is malicious. They just prove the channel exists.


Why it matters for agents

For human-to-human chat, invisible Unicode is a weird edge case. For an agent, it's an input-integrity problem.

LLMs don't distinguish visible text from invisible text the way humans do. They receive tokens from the raw string. If the hidden portion contains instructions, context poisoning, or side-channel data, the model treats it as legitimate user input.

StageWhat the developer thinks is happeningWhat can actually be happening
Message receivedUser asked a normal questionRaw string includes hidden extra content
Message passed to the agentAgent is processing visible user intentAgent is also receiving invisible payloads
Message stored in memoryConversation history is cleanHidden payloads may persist across turns
Agent actsModel is following the user's messageModel may be reacting to content nobody saw

This isn't just single-turn prompt injection. If raw text is stored in memory and replayed, it becomes session poisoning.


Why most developers will miss it

The workflow that causes this is the obvious one: receive a message, read message.text, pass it to the model, move on. It feels reasonable because the message looks normal everywhere the developer checks.

Rendering is not validation. And most people simply don't know this class of attack exists yet.

That's why telling every developer to "sanitize your inputs" doesn't work. Some won't know they need to. Others will sanitize only before inference but not before logging, memory, or downstream automation. The complexity has to be absorbed somewhere closer to the wire.


Bigger than prompt injection

Invisible Unicode in messaging isn't just a prompt-injection problem. It's a hidden-channel problem. Once a system lets invisible payloads pass through untouched, it opens the door to:

Risk classWhat it means in practice
Covert exfiltrationStolen data embedded invisibly in innocent-looking outbound messages
Hidden metadata channelsMachine-readable side-channel data alongside human-readable chat
Watermarking / fingerprintingInvisibly marking outbound messages for tracing and provenance
Session poisoningHidden payloads persisting in agent memory across turns

Same root cause: raw Unicode moving through the stack without inspection.


What a secure pipeline looks like

Treat invisible Unicode like any other untrusted input: normalize it, inspect it, preserve evidence, and control exposure.

Pipeline stepSecure behavior
Inbound message receiptCapture the raw string exactly as received
NormalizationRemove or isolate invisible Unicode from model-bound text
InspectionDetect suspicious patterns, length anomalies, encoded payloads
Policy enforcementBlock, warn, quarantine, or log based on score and settings
Downstream deliveryPass only the safe text into the agent and model
Audit trailRetain the raw original for forensic review

This is what unicode-shield does — a zero-dependency TypeScript normalization layer that strips invisible characters, bidi attacks, Zalgo, homoglyphs, and 400+ dangerous codepoints before any text reaches a model. Sanitized safe text for the LLM, raw original preserved for audit, suspicious patterns scored so you can block, warn, or log.


Takeaways

Messaging is turning into a real interface for AI. That means the attack surfaces of messaging are becoming the attack surfaces of models.

The old assumption — "if a message looks safe to a human, it's safe enough" — doesn't survive contact with agents. A trusted interface, plus invisible payloads, plus raw model ingestion is exactly the kind of architectural gap that produces silent failures in production. Not flashy ones. The dangerous kind: the model behaves strangely, nobody can see why, the logs look normal.

If you're building agents on top of messaging, don't assume the visible text is the whole message. Treat the raw string as hostile input. Sanitize before inference. Inspect before memory. Preserve the original for audit. And if you can, solve it at the infrastructure layer instead of asking every developer to rediscover it one production incident at a time.