Unicode Shield

22 April, 2026

A message that looks completely ordinary in the bubble can still carry hidden Unicode that a model will happily read as input. The human sees one sentence. The agent sees another. That gap is the whole bug.

The split

Most messaging stacks were built for humans. The working assumption is that the important part of a message is the part a human can see. That assumption breaks the moment an LLM becomes the real consumer of the string.

A human sees a normal sentence. A dashboard shows a normal sentence. A log line stores a normal sentence. But the model doesn't read rendered bubbles — it reads the raw string, invisible characters included.

Perspective	What appears to be in the message	What actually happens
Human sender / operator	A completely normal message	Nothing looks suspicious
Product logs and admin tools	The same visible sentence	Hidden characters are usually missed
Agent runtime	Raw Unicode text	The hidden payload is still there
Model context	The full raw string	Invisible instructions are processed as input

That's the entire attack in one table. If invisible characters reach the prompt, the failure has already happened.

A harmless demo

Here are two lines that look identical to the first one.

Example A — zero-width characters embedded in a normal message

What's the weather today?⁠

Example B — invisible tag characters appended to a normal message

What's the weather today?󠀤󠁤󠁥󠁭󠁯

Paste either one into a character viewer like this one and you'll see the actual bytes. Example A carries U+200B, U+2060, U+FEFF. Example B encodes the benign payload demo in Unicode tag characters.

Neither one is malicious. They just prove the channel exists.

Why it matters for agents

For human-to-human chat, invisible Unicode is a weird edge case. For an agent, it's an input-integrity problem.

LLMs don't distinguish visible text from invisible text the way humans do. They receive tokens from the raw string. If the hidden portion contains instructions, context poisoning, or side-channel data, the model treats it as legitimate user input.

Stage	What the developer thinks is happening	What can actually be happening
Message received	User asked a normal question	Raw string includes hidden extra content
Message passed to the agent	Agent is processing visible user intent	Agent is also receiving invisible payloads
Message stored in memory	Conversation history is clean	Hidden payloads may persist across turns
Agent acts	Model is following the user's message	Model may be reacting to content nobody saw

This isn't just single-turn prompt injection. If raw text is stored in memory and replayed, it becomes session poisoning.

Why most developers will miss it

The workflow that causes this is the obvious one: receive a message, read message.text, pass it to the model, move on. It feels reasonable because the message looks normal everywhere the developer checks.

Rendering is not validation. And most people simply don't know this class of attack exists yet.

That's why telling every developer to "sanitize your inputs" doesn't work. Some won't know they need to. Others will sanitize only before inference but not before logging, memory, or downstream automation. The complexity has to be absorbed somewhere closer to the wire.

Bigger than prompt injection

Invisible Unicode in messaging isn't just a prompt-injection problem. It's a hidden-channel problem. Once a system lets invisible payloads pass through untouched, it opens the door to:

Risk class	What it means in practice
Covert exfiltration	Stolen data embedded invisibly in innocent-looking outbound messages
Hidden metadata channels	Machine-readable side-channel data alongside human-readable chat
Watermarking / fingerprinting	Invisibly marking outbound messages for tracing and provenance
Session poisoning	Hidden payloads persisting in agent memory across turns

Same root cause: raw Unicode moving through the stack without inspection.

What a secure pipeline looks like

Treat invisible Unicode like any other untrusted input: normalize it, inspect it, preserve evidence, and control exposure.

Pipeline step	Secure behavior
Inbound message receipt	Capture the raw string exactly as received
Normalization	Remove or isolate invisible Unicode from model-bound text
Inspection	Detect suspicious patterns, length anomalies, encoded payloads
Policy enforcement	Block, warn, quarantine, or log based on score and settings
Downstream delivery	Pass only the safe text into the agent and model
Audit trail	Retain the raw original for forensic review

This is what unicode-shield does — a zero-dependency TypeScript normalization layer that strips invisible characters, bidi attacks, Zalgo, homoglyphs, and 400+ dangerous codepoints before any text reaches a model. Sanitized safe text for the LLM, raw original preserved for audit, suspicious patterns scored so you can block, warn, or log.

Takeaways

Messaging is turning into a real interface for AI. That means the attack surfaces of messaging are becoming the attack surfaces of models.

The old assumption — "if a message looks safe to a human, it's safe enough" — doesn't survive contact with agents. A trusted interface, plus invisible payloads, plus raw model ingestion is exactly the kind of architectural gap that produces silent failures in production. Not flashy ones. The dangerous kind: the model behaves strangely, nobody can see why, the logs look normal.

If you're building agents on top of messaging, don't assume the visible text is the whole message. Treat the raw string as hostile input. Sanitize before inference. Inspect before memory. Preserve the original for audit. And if you can, solve it at the infrastructure layer instead of asking every developer to rediscover it one production incident at a time.