![]() Where ECMAScript operations interpret String values, each element is interpreted as a single UTF-16 code unit. The length of a String is the number of elements within it. The important thing here is know if the methods we are using works with code points or code units. In the first case, the letter A is encoded using 1 code unit of 16 bits while the emoji □ requires 2 code units of 16 bits to be represented. The String type is generally used to represent textual data in a running ECMAScript program, in which case each element in the String is treated as a UTF-16 code unit* value. The String type is the set of all ordered sequences of zero or more 16-bit unsigned integer values ("elements") up to a maximum length of 2 53-1 elements. Shocked? Well, this is more easy to understand if we see the definition of String that ES6 does: Fortunately JavaScript has a special syntax to represent characters both using their code point or code unit values: Of course the best way to write characters is writing them directly with the keyboard, but there could be some of them difficult to write (like emojis or math symbols). When you need 2 code units to represent a code point they are called a surrogate pair, where the first value of the pair is a high-surrogate code unit and the second value is a low-surrogate code unit. So, what is a code points? A code unit is a bit sequence used to encode each character within a given encoding form, so we found the unicode character could be represented in JavaScript using 1 or 2 code units. Note when you write encoded character in HTML you are using the decimal notation, while in JavaScript you usually use the hexadecimal one. Note, while the code points at BPM plane have all 4 digits the code points in supplementary planes can have 5 o 6 digits, for example: The 16 planes beyond the BMP (from plane 1 to plane 16) are named supplementary or astral planes. Plane 16 contains code points from U+100000 to U+10FFFF.Plane 2, Supplementary Ideographic Plane (SIP), contains code points from U+20000 to U+2FFFF.Plane 1, Supplementary Multilingual Plane (SMP), contains code points from U+10000 to U+1FFFF.It contains characters from most of the modern languages (Basic Latin, Cyrillic, Greek, etc) and a big number of symbols. Plane 0, Basic Multilingual Plane (BMP), contains code points from U+0000 to U+FFFF. ![]() Included: arrays of code points, arrays of symbols, and regular expressions for Unicode v15.0.0s. In addition, the unicode space is divided in 17 planes: JavaScript-compatible Unicode data for use in Node.js. ![]() Now multiply 65,536 by the 17 planes and you get the 1,114,112. The first plane goes from U+0000 to U+FFFF, that is 16 4 (or 2 16 if you think in binary), which results in 65,536 characters. Unicode allows to represent 1,114,112 code points which ranges from U+0000 to U+10FFFF and only 144,697 has an associated character. The thing you need to remember is a code point is a number assigned to a single character. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |