At Small Town Heroes I’m currently working on a newsreader app built using React Native. On Android (even 7.1.1) we noticed this weird issue where some emojis would render incorrectly when we were applying styling on it using index-based ranges: the range seemed to be off by one, splitting the emoji into its separate bytes. What made this issue even more weird is that this behaviour stopped when we connected the app to a debugging session.
Figure: Result of applying a specific style on this 41 symbol counting sentence.
To get the correct symbol count, you can use
Array.from() or the spread operator (*):
>> Array.from('Emoji 🤖'); ["E", "m", "o", "j", "i", " ", "🤖"] >> Array.from('Emoji 🤖').length 7 >> [...'Emoji 🤖'].length 7
(*) Do note that this technique is not 100% bulletproof though. It has “problems” with skin tone modifiers and other emoji combinations – which in itself yields some fun results – but let’s ignore that for now.
Knowing how to get the correct count, it’s possible to extract proper substrings from that sentence, to apply your styling on (*):
// Wrong way to do it (not multibyte-aware) >> 'Emoji 🤖'.substr(0,7); "Emoji �" // Correct way to do it (multibyte-aware) >> [...'Emoji 🤖'].slice(0,7).join(''); "Emoji 🤖"
(*) Why not just use
String#split all throughout our code (thus bypassing the whole thing) you might wonder? Well, the editor used to input the article *is* multibyte aware, so it would return
7 as the length of that sentence 😉
Now, even though we were using
Array.from() to get the correct substrings, we ran into issues on Android whilst doing so: it would aways yield
"Emoji �", no matter which technique we used. Long story short: we found out that the runtime on the Android phone – somehow – was using a non-multibyte aware
Array.from(), explaining the wrong result.
// Android 7.1.1 >> Array.from('Emoji 🤖'); ["E", "m", "o", "j", "i", " ", "�", "�"] // <-- Wait, wut?
The solution to bypassing this mysterious problem was to use
runes, a library that's Unicode-aware. Above that it also plays nice with skin tone modifiers and other emoji combinations, making it superior to the
const runes = require('runes'); // Standard String.split '♥️'.split('') => ['♥', '️'] 'Emoji 🤖'.split('') => ['E', 'm', 'o', 'j', 'i', ' ', '�', '�'] '👩👩👧👦'.split('') => ['�', '�', '', '�', '�', '', '�', '�', '', '�', '�'] // ES6 string iterator [...'♥️'] => [ '♥', '️' ] [...'Emoji 🤖'] => [ 'E', 'm', 'o', 'j', 'i', ' ', '🤖' ] [...'👩👩👧👦'] => [ '👩', '', '👩', '', '👧', '', '👦' ] // Runes runes('♥️') => ['♥️'] runes('Emoji 🤖') => ['E', 'm', 'o', 'j', 'i', ' ', '🤖'] runes('👩👩👧👦') => ['👩👩👧👦']
const runes = require('runes') // String.substring '👨👨👧👧a'.substring(1) => '�👨👧👧a' // Runes runes.substring('👨👨👧👧a', 1) => 'a'
runes – Unicode-aware JS string splitting with full Emoji support →
I don't run ads on my blog nor do I do this for profit. A donation however would always put a smile on my face though. Thanks!