Emoji Family

Emoji Unicode: A Complete Reference for Developers

March 9, 2026

Emojis are standardised Unicode characters. Understanding how they work at the Unicode level is essential for any developer building an app, API, or website that handles emoji. This guide covers everything you need to know.

What is Unicode?

Unicode is the universal standard for text encoding; it assigns a unique number (called a code point) to every character in every writing system, including emojis. There are over 150,000 Unicode characters, including approximately 3,600+ emojis.

Unicode code points are written as U+XXXX where XXXX is a hexadecimal number.

For example:

  • U+1F600: πŸ˜€ Grinning face
  • U+2764: ❀️ Red heart
  • U+1F525: πŸ”₯ Fire

UTF-8 Encoding

Most emoji use code points above U+FFFF (the "Basic Multilingual Plane"). These are represented in UTF-8 using 4 bytes:

EmojiCode pointUTF-8 bytes
πŸ˜€U+1F600F0 9F 98 80
πŸ”₯U+1F525F0 9F 94 A5
❀️U+2764 + U+FE0FE2 9D A4 EF B8 8F

This matters when:

  • Storing emoji in databases (use utf8mb4 in MySQL, not utf8)
  • Counting string length (a single emoji may be 2+ characters in JavaScript)
  • Building APIs that accept emoji in query parameters (URL-encode them)

Emoji Sequences

Many modern emojis are not single code points; they're sequences of multiple code points combined.

Zero-Width Joiner (ZWJ) Sequences

The ZWJ character (U+200D) joins two emojis into a new combined emoji:

πŸ‘¨ + ZWJ + 🍳 = πŸ‘¨β€πŸ³ (Man cook)
πŸ‘© + ZWJ + πŸ’» = πŸ‘©β€πŸ’» (Woman technologist)
🏳️ + ZWJ + 🌈 = πŸ³οΈβ€πŸŒˆ (Rainbow flag)

If the ZWJ sequence isn't supported, the system falls back to showing the individual emojis side-by-side.

Skin Tone Modifiers

Five skin tone modifier code points (U+1F3FB through U+1F3FF) can follow supported emojis to change their skin tone:

πŸ‘‹ + U+1F3FB = πŸ‘‹πŸ» (Light skin tone)
πŸ‘‹ + U+1F3FC = πŸ‘‹πŸΌ (Medium-light skin tone)
πŸ‘‹ + U+1F3FD = πŸ‘‹πŸ½ (Medium skin tone)
πŸ‘‹ + U+1F3FE = πŸ‘‹πŸΎ (Medium-dark skin tone)
πŸ‘‹ + U+1F3FF = πŸ‘‹πŸΏ (Dark skin tone)

Variation Selectors

The variation selector U+FE0F (VS16) forces text-presentation characters to render as emoji:

  • ❀ (U+2764): may render as a text heart
  • ❀️ (U+2764 + U+FE0F): renders as an emoji heart

Many emoji require this selector for consistent rendering.

Emoji in JavaScript

JavaScript strings use UTF-16 encoding. Emojis above U+FFFF are stored as surrogate pairs: two 16-bit code units.

String Length Issues

"πŸ˜€".length // β†’ 2 (not 1!)
[..."πŸ˜€"].length // β†’ 1 (using spread/iterator, which handles surrogate pairs)

Iterating Over Emojis

const text = "Hello πŸ˜€ World";

// Wrong: iterates over UTF-16 code units (breaks on emoji)
for (let i = 0; i < text.length; i++) { }

// Correct: iterates over Unicode code points
for (const char of text) {
  console.log(char);
}

// Or using spread
const chars = [...text]; // ["H", "e", "l", "l", "o", " ", "πŸ˜€", " ", "W", "o", "r", "l", "d"]

Getting the Code Point

"πŸ˜€".codePointAt(0).toString(16) // β†’ "1f600"
"πŸ˜€".codePointAt(0) // β†’ 128512

Checking for Emoji in a String

Use the Unicode property escapes in modern JavaScript regex:

const emojiRegex = /\p{Emoji}/u;
emojiRegex.test("Hello πŸ˜€") // β†’ true
emojiRegex.test("Hello world") // β†’ false

Emoji in URLs

Emoji characters must be percent-encoded in URLs:

πŸ˜€ β†’ %F0%9F%98%80
πŸ”₯ β†’ %F0%9F%94%A5

In JavaScript:

encodeURIComponent("πŸ˜€") // β†’ "%F0%9F%98%80"
decodeURIComponent("%F0%9F%98%80") // β†’ "πŸ˜€"

The Emoji Family API accepts emoji characters directly in the URL path, for example:

GET /api/emojis/πŸ˜€
GET /api/emojis/1f600

Both the emoji character and its hexcode are accepted.

Emoji in Databases

MySQL / MariaDB

The standard utf8 charset in MySQL only supports 3-byte characters and cannot store emoji. Use utf8mb4:

CREATE TABLE messages (
  body TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);

Or set your entire database to utf8mb4:

ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

PostgreSQL

PostgreSQL uses UTF-8 by default and handles emoji without any special configuration.

SQLite

SQLite stores text as UTF-8 natively, so emoji work out of the box.

Emoji Data Resources

  • Unicode.org: The authoritative source for emoji data: unicode.org/emoji/charts/full-emoji-list.html
  • Emoji Family API: Free API for emoji metadata, SVG and PNG images (see the developer docs)
  • CLDR (Common Locale Data Repository): Provides emoji names and keywords in multiple languages

You can also look up any individual emoji's hexcode, Unicode version, group, subgroup, and tags on Emoji Family's emoji pages.