Strings
A string is a sequence of characters.
This lesson expands on the basics of strings and incorporates what you learned about arrays and functions.
Consider the simple program below, which outputs the length of a few strings. Can you predict what the output will be?
The letter A
is considered 1 character, but the emoji π©
is 2 characters, and the emoji π¨βπ¦
is actually much longer! If you are surprised by this, you may be asking the following questions:
Retrieving characters
You can retrieve individual characters from a string by index (character position) in the same way you would retrieve elements from an array.
When you retrieve the character at index 0, you are actually retrieving a single UTF-16 code unit at the specified offset.
Accessing an out-of-bound array index like emoji[2]
or even emoji[-1]
produces undefined
, and does not throw an error.
The at
function returns a new string containing the code point at the given index.
The argument can be a positive or negative integer - if negative, the index counts backward from the last character.
The at
function requires your TypeScript compiler target to be set to ES2022
or higher. If you are targeting older JavaScript versions or simply don't need negative indexing, the charAt
function is an equivalent option for retrieving individual characters.
The at
and charAt
functions always return a new string with the single UTF-16 code point at the given index, so it may return lone surrogates.
To get the full Unicode code point at a given index, use codePointAt
and fromCodePoint
.
Retrieving character subsequences
You can retrieve a subsequence of the string's character with slice
, which does support negative indexes.
While at
is the most concise way to retrieve a single character of a string, the same string can also be retrieved with slice
and charAt
.
Padding a string
The padStart
and padEnd
function continuously append characters to a string until it reaches a certain length. These functions are especially useful when displaying information in tables or aligning values in console output.
If the character to append is not specified, it is assumed to be a space character. The resulting string will be the exact specified length, which means a padding string that does not evenly fit the remaining space will be truncated.
Notice that if the original string is larger than the desired padding length, it won't be truncated.
Padding is often useful when converting numbers to a fixed width. Consider the function below that converts an RGB value to a hex code, which
would otherwise fail to output a correct hex code if either red
, green
, or blue
are less than 16.
Repeating a string
A string's repeat
function produces a string which contains the specified number of copies of the original string.
Splitting a string
You can split a string into an array with the split
function.
The first argument to split
is the character to split by.
The second argument to split
allows you to limit the number of split elements that are returned. This is an easy way to
efficiently return the first few words or lines from a long string.
Regular expressions
A regular expression is a pattern for matching a string.
Regular expressions are an extremely useful mechanism for extracting information from strings and manipulating their contents. There is an entire lesson on regular expressions - this is just a condensed summary for the purpose of understanding strings.
Creating a regular expression
You can specify a regular expression by writing a pattern between two slash (/
) characters, or by creating a RegExp
object.
These lessons tend to use the /
notation to define regular expressions, but it is worth noticing that all regular expressions are simply RegExp
objects regardless of how they are created. You can use the test
function of a RegExp
object to check if a string matches the pattern.
Matching a string
The string match
function searches a string with a regular expression, and returns an array of matches. If there were no
matches, it returns null
.
After the ending slash character (/
) of a regular expression, one or more flags can be added to modify the pattern's behavior. The g
flag
returns all matches in the string, not just the first one.
The i
flag performs a case-insensitive match, and can be combined with the g
flag.
Special regex characters
There are special regex characters like \s
, \d
, and \w
which represent character groups.
\w
- matches any letter (a - z
andA - Z
)\d
- matches any digit (0 - 9
)\s
- matches any space character (\n
,\t
)
The \b
character represents a word boundary.
- TODO: all groups
Searching a string
The search()
method searches a string for a match to a regular expression.
It returns the index of the first match, or -1 if no match is found.
Replacing in a string
The string replace
function can replace parts of a string that match a regular expression with a new string.
The special regex character +
matches the preceding pattern one or more times.
The replacement string can also be a function which accepts the content of each match as an argument.
Comparing strings
The localeCompare
function returns a number that indicates how a string should be ordered in relation to another string.
This is commonly used in an array sort
function to arrange a list of strings into alphabetical order.
Searching a string
The includes
function performs a case-sensitive search to determine whether a certain string appears within the string's character sequence.
If you need to perform a case insensitive search, convert the sentence and search query to the same case.
You can also provide an index to start searching from.
The indexOf
function searches a string and returns the index of the first occurrence of the given string.
The lastIndexOf
function returns the last occurrence of the given string.
Concatenating strings
The concat
method creates a new string by concatenating the given strings to the target string. This is equivalent to using the +
operator to join strings in order.
Notice that the original string is not modified. In JavaScript, if the arguments to concat
are not strings, they are converted to strings before joining. TypeScript enforces a type of string
on the arguments to concat
.
Literal types
The generic string
type refers to any sequence of characters. TypeScript also allows us to define types that refer to specific strings.
Encoding strings
JavaScript strings are encoded into a format known as UTF-16.
This format is great for string manipulation, but not ideal when interfacing with binary protocols, file systems, or Web APIs expecting the UTF-8 format. The built-in TextEncoder
and TextDecoder
provide a standard way to convert between JavaScript strings and raw binary data that uses standardized encodings like UTF-8.
TextEncoder
TextEncoder
takes a JavaScript string and encodes it into a Uint8Array of bytes, typically using UTF-8.
Each item in the array is a byte (0 - 255) representing part of the UTF-8 encoding of the string. This is especially useful when working with emojis and other special characters that may require more than 2 bytes to represent.
TextDecoder
TextDecoder
reverses the process of TextEncoder
, taking some binary data like a Uint8Array
(or Buffer
in Node.js) and decoding it into a UTF-16 JavaScript string.
It assumes UTF-8 by default, but supports other encodings as well.
Error handling
You can configure TextDecoder
to throw an error when an invalid byte sequence is encountered. This can be important for secure applications where arbitrary strings are being accepted from user input.
TextEncoder
only supports UTF-8 by design β modern web standards are UTF-8 first.
Working with ArrayBuffer
String templates
String template literals make it easy to create strings that include expressions or variables.
A template literal is enclosed in backticks (`
) instead of single or double quotes. This allows you to embed expressions inside the string using ${...}
.
You can embed arbitrary expressions in a template literal.
Tagged template literals
A tagged template allows you to process a template literal with a function.
The function receives the string, along with any embedded expressions, as arguments.
The first argument is an array of strings with each fragment of literal text from the template. The type
of this argument is TemplateStringsArray
, and not string[]
.
The subsequent arguments are the values to insert between the literal text fragments.
You can also use a function generator to produce a tagging function.