Encoding strings
JavaScript strings are encoded into a format known as UTF-16.
This format is great for string manipulation, but not ideal when interfacing with binary protocols, file systems, or Web APIs expecting the UTF-8 format. The built-in TextEncoder
and TextDecoder
provide a standard way to convert between JavaScript strings and raw binary data that uses standardized encodings like UTF-8.
TextEncoder
TextEncoder
takes a JavaScript string and encodes it into a Uint8Array of bytes, typically using UTF-8.
Each item in the array is a byte (0 - 255) representing part of the UTF-8 encoding of the string. This is especially useful when working with emojis and other special characters that may require more than 2 bytes to represent.
TextDecoder
TextDecoder
reverses the process of TextEncoder
, taking some binary data like a Uint8Array
(or Buffer
in Node.js) and decoding it into a UTF-16 JavaScript string.
It assumes UTF-8 by default, but supports other encodings as well.
Error handling
You can configure TextDecoder
to throw an error when an invalid byte sequence is encountered. This can be important for secure applications where arbitrary strings are being accepted from user input.
TextEncoder
only supports UTF-8 by design — modern web standards are UTF-8 first.