Emoji Unicode: A Complete Reference for Developers
March 9, 2026
Emojis are standardised Unicode characters. Understanding how they work at the Unicode level is essential for any developer building an app, API, or website that handles emoji. This guide covers everything you need to know.
What is Unicode?
Unicode is the universal standard for text encoding; it assigns a unique number (called a code point) to every character in every writing system, including emojis. There are over 150,000 Unicode characters, including approximately 3,600+ emojis.
Unicode code points are written as U+XXXX where XXXX is a hexadecimal number.
For example:
U+1F600: π Grinning faceU+2764: β€οΈ Red heartU+1F525: π₯ Fire
UTF-8 Encoding
Most emoji use code points above U+FFFF (the "Basic Multilingual Plane"). These are represented in UTF-8 using 4 bytes:
| Emoji | Code point | UTF-8 bytes |
|---|---|---|
| π | U+1F600 | F0 9F 98 80 |
| π₯ | U+1F525 | F0 9F 94 A5 |
| β€οΈ | U+2764 + U+FE0F | E2 9D A4 EF B8 8F |
This matters when:
- Storing emoji in databases (use
utf8mb4in MySQL, notutf8) - Counting string length (a single emoji may be 2+ characters in JavaScript)
- Building APIs that accept emoji in query parameters (URL-encode them)
Emoji Sequences
Many modern emojis are not single code points; they're sequences of multiple code points combined.
Zero-Width Joiner (ZWJ) Sequences
The ZWJ character (U+200D) joins two emojis into a new combined emoji:
π¨ + ZWJ + π³ = π¨βπ³ (Man cook)
π© + ZWJ + π» = π©βπ» (Woman technologist)
π³οΈ + ZWJ + π = π³οΈβπ (Rainbow flag)
If the ZWJ sequence isn't supported, the system falls back to showing the individual emojis side-by-side.
Skin Tone Modifiers
Five skin tone modifier code points (U+1F3FB through U+1F3FF) can follow supported emojis to change their skin tone:
π + U+1F3FB = ππ» (Light skin tone)
π + U+1F3FC = ππΌ (Medium-light skin tone)
π + U+1F3FD = ππ½ (Medium skin tone)
π + U+1F3FE = ππΎ (Medium-dark skin tone)
π + U+1F3FF = ππΏ (Dark skin tone)
Variation Selectors
The variation selector U+FE0F (VS16) forces text-presentation characters to render as emoji:
β€(U+2764): may render as a text heartβ€οΈ(U+2764 + U+FE0F): renders as an emoji heart
Many emoji require this selector for consistent rendering.
Emoji in JavaScript
JavaScript strings use UTF-16 encoding. Emojis above U+FFFF are stored as surrogate pairs: two 16-bit code units.
String Length Issues
"π".length // β 2 (not 1!)
[..."π"].length // β 1 (using spread/iterator, which handles surrogate pairs)
Iterating Over Emojis
const text = "Hello π World";
// Wrong: iterates over UTF-16 code units (breaks on emoji)
for (let i = 0; i < text.length; i++) { }
// Correct: iterates over Unicode code points
for (const char of text) {
console.log(char);
}
// Or using spread
const chars = [...text]; // ["H", "e", "l", "l", "o", " ", "π", " ", "W", "o", "r", "l", "d"]
Getting the Code Point
"π".codePointAt(0).toString(16) // β "1f600"
"π".codePointAt(0) // β 128512
Checking for Emoji in a String
Use the Unicode property escapes in modern JavaScript regex:
const emojiRegex = /\p{Emoji}/u;
emojiRegex.test("Hello π") // β true
emojiRegex.test("Hello world") // β false
Emoji in URLs
Emoji characters must be percent-encoded in URLs:
π β %F0%9F%98%80
π₯ β %F0%9F%94%A5
In JavaScript:
encodeURIComponent("π") // β "%F0%9F%98%80"
decodeURIComponent("%F0%9F%98%80") // β "π"
The Emoji Family API accepts emoji characters directly in the URL path, for example:
GET /api/emojis/π
GET /api/emojis/1f600
Both the emoji character and its hexcode are accepted.
Emoji in Databases
MySQL / MariaDB
The standard utf8 charset in MySQL only supports 3-byte characters and cannot store emoji. Use utf8mb4:
CREATE TABLE messages (
body TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);
Or set your entire database to utf8mb4:
ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
PostgreSQL
PostgreSQL uses UTF-8 by default and handles emoji without any special configuration.
SQLite
SQLite stores text as UTF-8 natively, so emoji work out of the box.
Emoji Data Resources
- Unicode.org: The authoritative source for emoji data:
unicode.org/emoji/charts/full-emoji-list.html - Emoji Family API: Free API for emoji metadata, SVG and PNG images (see the developer docs)
- CLDR (Common Locale Data Repository): Provides emoji names and keywords in multiple languages
You can also look up any individual emoji's hexcode, Unicode version, group, subgroup, and tags on Emoji Family's emoji pages.