Your company is registering a domain name in Chinese — 例え.jp — and the registrar shows xn--r8jz45g.jp in the confirmation. Are those the same domain? Or you’re investigating a phishing email where the link appears to go to apple.com but the URL bar shows xn--pple-43d.com. You need to decode and compare Punycode to understand what domain you’re actually visiting.

What Is Punycode?

Punycode is an encoding system defined in RFC 3492 that represents Unicode characters using the limited ASCII character set allowed in domain names. It is the foundation of Internationalized Domain Names (IDN) — the system that allows domain names to contain non-ASCII characters like Chinese, Arabic, or accented Latin characters.

The Domain Name System (DNS) only supports ASCII labels — letters a–z, digits 0–9, and hyphens. To support domain names in other scripts, the IDN standard (RFC 5891) uses Punycode to encode Unicode labels into ASCII-Compatible Encoding (ACE), prefixed with xn--:

Unicode:  muenchen.de (with u-umlaut)
Punycode: xn--mnchen-3ya.de

Unicode:  zhongwen.com (in Chinese characters)
Punycode: xn--fiq228c.com

Each label (the part between dots) is encoded independently. If a label is already pure ASCII, it remains unchanged.

How Punycode Encoding Works

Punycode uses a clever algorithm called Bootstring that achieves compact encoding by exploiting the pattern that most IDN labels contain a mix of ASCII and non-ASCII characters:

Extract and preserve ASCII characters: All ASCII characters in the label are copied to the output first, in their original order.
Encode non-ASCII positions: The positions and codepoints of non-ASCII characters are encoded as a sequence of integers using a variable-length encoding with an adaptive bias.
Separate with a hyphen: A hyphen - separates the ASCII portion from the encoded non-ASCII portion.

The algorithm is deterministic: the same Unicode input always produces the same Punycode output, and decoding always recovers the exact original Unicode string.

IDN and the Homograph Attack Problem

Punycode’s existence enables IDN homograph attacks — one of the most subtle phishing techniques. Many Unicode characters look identical or nearly identical to ASCII characters:

Unicode	Looks like	Codepoint
Cyrillic a	Latin a	U+0430
Cyrillic e	Latin e	U+0435
Cyrillic o	Latin o	U+043E
Cyrillic p	Latin p	U+0440

An attacker can register a domain using Cyrillic characters that renders identically to apple.com in most fonts but resolves to a completely different xn-- domain.

Modern browsers mitigate this by displaying the Punycode form (xn--...) in the address bar when a domain mixes scripts or uses characters from suspicious combinations. This tool helps you verify: paste a suspicious domain and see its true Punycode representation, or decode a xn-- domain to see what Unicode characters it actually contains.

Where Punycode Appears

Domain registration: When you register an IDN through a domain registrar, the system stores the Punycode form. WHOIS lookups, DNS records, SSL certificates, and HTTP headers all use the Punycode form internally.

SSL/TLS certificates: Certificate authorities issue certificates for the Punycode form of IDN domains. If your certificate is for the Punycode version of your domain, your server must be configured to serve that exact domain.

Email addresses: The domain part of internationalized email addresses (EAI, RFC 6531) uses Punycode in SMTP. The Unicode domain is transmitted as its xn-- equivalent at the protocol level.

DNS configuration: When setting up DNS records (A, AAAA, CNAME, MX) for IDN domains, you must use the Punycode form in your zone file.

Web crawlers and SEO: Search engines index both forms, but the canonical URL in Google Search Console will show the Punycode form. Understanding the mapping is essential for international SEO.

IDNA 2003 vs IDNA 2008

Two versions of the IDN standard exist, and they differ on several characters:

IDNA 2003 (RFC 3490): Maps some characters before encoding — for example, the German sharp s is mapped to ss, and uppercase is mapped to lowercase.
IDNA 2008 (RFC 5891): Does not perform character mapping. The sharp s is encoded as-is, producing a different domain than the ss equivalent.

Most modern browsers and libraries use IDNA 2008, but some older systems still use IDNA 2003. This can cause subtle incompatibilities — a domain registered under IDNA 2008 rules may not resolve correctly on IDNA 2003 clients.

Common IDN Examples

Unicode Domain	Punycode	Script
German city .de	xn—mnchen-3ya.de	German
Chinese characters .com	xn—fiq228c.com	Chinese
Russian .rf	xn—h1alffa9f.xn—p1ai	Russian
Hindi .bharat	xn—h2brj9c.xn—h2brj9c	Hindi
Japanese .jp	xn—wgv71a309e.jp	Japanese

Privacy

All conversion happens in your browser. No domain names or text are sent to any server.

Punycode Converter