tl;dr: TextDecoder is really fast for bigger strings. Very small strings (< 18 char ish) can actually benefit from using a simple custom decoder using an array.
Nevertheless, there are some lower level APIs that are exposed. We have text encoders, as well as access to UInt8Arrays. These basically allocate linear memory that you can access similarly to arrays.
I’ve been thinking a lot recently about the most efficient way to store a book. In the end I decided to store the book in binary rather than in JSON string format. So this article is going to be about some performance / benchmarks around different ways to retrieve strings from binary string data.
A Micro-Course on UTF-8
UTF-8 is represented as a byte array (as all strings are underneath the hood regardless of encoding). Every byte stores a number from 0 to 255. Obviously there are more than 255 characters in all the languages. UTF-8 supports all 1,112,064 characters of unicode.
How is this possible? UTF-8 can stretch and take up extra room for more complex characters. The most common english characters exist from 0–127. These include alphanumerical characters, some symbols and some system characters — like line breaks for example. At byte 128, the decoder understands that the next byte then will become part of the character. Then it becomes a two-byte character. This makes the amount of characters exponential, as each byte above 127 has a whole other 255 to work with. Continuing with this concept, UTF-8 can store up to 4 bytes of string.