Prompt Injection via Emojis 😈

TL;DR

Attackers can embed invisible instructions inside emoji and Unicode sequences.

Humans can’t see them, but models can, enabling them to slip past filters and steer LLM behavior.

Defend by sanitizing all model-reachable text, flagging suspicious code points, and establish capability and output limits for MCP servers.

Text as the new attack vector

As LLMs are wired into enterprise workflows, the attack surface shifts from code exploits to input exploits.

Prompt injection is the archetype: the model reads instructions that humans do not (or cannot) see.

Recent research demonstrates that hidden Unicode characters, including emoji‑related constructs, can carry covert instructions that model pipelines will parse and act upon, while UI and human reviewers see nothing but benign text.

Why MCP environments are uniquely exposed

MCP standardizes Resources (context), Prompts, and Tools across servers and clients. Each step involves text moving across trust boundaries. Malicious payloads can be hidden inside:

Tool descriptions and annotations
Prompt templates and libraries
Resource previews or summaries returned to the model
Free‑text fields passed through to tools

How the Attack Works

Primer: the Unicode plumbing behind emojis

Emojis are built from multiple Unicode code points—some visible, some not—that control how characters join, display, or carry metadata. Those same invisible components can also be used to hide data.

Key mechanisms:

Variation Selectors (VS15/VS16): Invisible modifiers that switch between text and emoji presentation, changing how a character renders without adding visible symbols.
Zero-Width Joiner (ZWJ, U+200D): Fuses characters into a single emoji (e.g., 👨‍👩‍👧). It’s unseen but fully present in the text the model reads.
Emoji Tag Sequences (U+E0000–U+E007F): Hidden “tag” characters encoding metadata like country sub-regions for flags—ignored by renderers but processed by parsers.

The result is that payloads can be hidden without altering what humans see. A friendly emoji can mask a string of invisible instructions that models still interpret!

Screenshot showing a prompt injection attempt with hidden message extraction using Variation Selectors

Chain of events

The attacker crafts benign‑looking text (e.g., “Thanks for your help! 👋😊”).
Invisible instructions are embedded using zero‑width characters, emoji tag characters, or variation selectors.
LLM receives the full sequence (visible + invisible) and may interpret the hidden part as control instructions (e.g., change behavior, select a different tool, exfiltrate data).
Filters miss it because they look for dangerous words, not unusual code points.

Common infiltration points

1. Direct user input: Chat UIs, email, support tickets, docs pasted into the LLM

2. Indirect content & supply chain

Tool/resource descriptions from third‑party MCP servers
Transcluded content (docs, web results, PDFs)
Generated content from other models (RAG, agents)

3. During adjudication: Emoji‑laden outputs can trick judge/guardrail LLMs to green‑light otherwise unsafe content.

A Practical Guide to Defending Against Prompt Injections

Below is a layered defense you can implement today. Each control is designed to be safe, realistic, and compatible with international text.

1. Sanitize & normalize at the edges

Strip or flag rarely legitimate Unicode ranges:

Zero-width / invisible: U+200B–U+200D, U+2060–U+206F
Emoji tag block: U+E0000–U+E007F
Bidi overrides / isolates: U+202A–U+202E, U+2066–U+2069
Variation selectors: U+FE00–U+FE0F
Soft hyphen: U+00AD

Normalize text (NFC or NFKC) and run confusable-character detection (Unicode UTS #39, ICU uspoof) to catch spoofing.

Quick detection example (Python)

# Minimal scanner: flag suspicious code points; do not mutate content by default.
SUSPICIOUS_RANGES = [
    (0x200B, 0x200D),      # zero-width chars
    (0x2060, 0x206F),      # invisible separators/formatting
    (0xE0000, 0xE007F),    # emoji tag block
    (0x202A, 0x202E),      # bidi overrides
    (0x2066, 0x2069),      # bidi isolates
    (0xFE00, 0xFE0F),      # variation selectors
    (0x00AD, 0x00AD),      # soft hyphen
]
def has_suspicious_unicode(s: str) -> list[int]:
    hits = []
    for i, ch in enumerate(s):
        cp = ord(ch)
        for lo, hi in SUSPICIOUS_RANGES:
            if lo <= cp <= hi:
                hits.append(i)
                break
    return hits

2. Treat all model-reachable text as untrusted

Sanitize tool descriptions, prompt templates, resource previews, and server-supplied strings before showing them to an LLM.
Display a “hidden character detected” warning in operator UIs.
Require re-consent when manifests or tool metadata change.

3. Separate data from instructions

Use structured, typed arguments instead of free text.
Add explicit policy prompts telling models to ignore invisible characters.
Employ multiple moderation or guardrail models with different tokenizers to avoid shared blind spots.

4. Limit tool capabilities

Add human-in-the-loop approvals for high-impact or write actions.
Set rate limits, scope restrictions, and dry-run modes.
Whitelist allowed argument formats and verify outputs before committing side effects.

5. Monitor and test continuously

Log and alert on abnormal Unicode density.
Include zero-width, bidi, and emoji-tag sequences in red-team tests.
Seed “canary” documents with known hidden characters to confirm your pipeline detects them.

What “good” looks like in an MCP stack (Heimgard’s perspective)

For teams securing MCP servers or managing a registry of approved tools and servers, aim for layered hygiene and verification across four levels:

1. Registry Policy Gates

Quarantine or reject submissions whose metadata (name, description, examples) include suspicious Unicode; trigger human review before publication.
Enforce signed manifests, digest pinning, and artifact integrity verification at upload.

2. Server Hardening

Sanitize all inbound parameters and outbound responses.
Prefer typed outputs (structured JSON, enums) over free-form text.
Emit an x-unicode-flags header summarizing any characters stripped or flagged for transparency.

3. Host Integration Controls

Sanitize all text crossing trust boundaries—tool docs, previews, RAG chunks—before it reaches the model.
Apply capability constraints and interactive approvals for tools that can write, delete, or modify data.
Keep audit trails of both raw and normalized transcripts for investigation and traceability.

4. Continuous Verification

Regularly re-scan the registry for new Unicode or emoji evasion patterns.
Deploy synthetic canaries (known hidden sequences) to confirm the detection pipeline still catches them.

References & Further Reading

Promptfoo. Hidden and Invisible Unicode Used Against LLMs: How and Why, 2024
arXiv. Emoji Attack: How Emoji Bias Judge LLMs, 2024
arXiv. Special-Character Adversarial Attacks on Open-Source Language Model, 2025
[OWASP] (https://genai.owasp.org/llmrisk/llm01-prompt-injection/). LLM Prompt Injection, 2025