Richard has done a nice writeup on hyphenation in CSS. Turns out we have some nice controls to tweaking how hyphenation on your site works:
There is more to setting hyphenation than just turning on the hyphens. The CSS Text Module Level 4 has introduced the same kind of hyphenation controls provided in layout software (eg. InDesign) and some word processors (including Word). These controls provide different ways to define how much hyphenation occurs through your text.
In the image above for example you can see hyphenate-limit-lines at work (on the right), which can limit the number of consecutive lines with hyphens.
1. Why do some emojis get text-wrapped, and some not?
This is dependent on the “Line Breaking Property” which is set for/on a character.
The Line Breaking Properties Specification defines a set of possible classes:
Allows a break opportunity after in specified contexts
Prevents a break opportunity after in specified contexts
Allows a break opportunity before in specified contexts
Prevents a break opportunity before in specified contexts
Allows a break opportunity for a pair of same characters
Prevents a break opportunity for a pair of same characters
These classes are then combined into properties. To us relevant properties (extracted from the spec) are:
Characters with this property do not require other characters to provide break opportunities; lines can ordinarily break before and after and between pairs of ideographic characters.
Ordinary Alphabetic and Symbol Characters (XP)
Characters with this property require other characters to provide break opportunities; otherwise, no line breaks are allowed between pairs of them.
It’s these properties that can then be assigned to a specific character.
Say a character has the AL Line Breaking Property set, then it means that no line breaks are allowed between pairs of them.
Characters (and thus emojis) have a “Line Breaking Property”, which applies one or more breaking classes onto the character. Two of the possible values for said property are:
ID = (B) and (A) classes = allow breaks before and after the character
AL = (XP) class = don’t allow breaks in between pairs.
2. If the above is a feature: is there a list of emojis – or can we detect which emojis – that get text-wrapped (and those who are not) available somewhere?
The answer to question 1 clearly indicates that it’s an actual feature. Khaled also sent me a link to a plain text file mentioning the Line Breaking Properties per character. Linking back to the tests I ran, we can extract these values for 💩 and ⚠️:
💩 = 1F4A5..1F4A9;ID
⚠️ = 26A0..26BC;AL
The plot thickens, right?
💩 and ⚠️ have a different Line Breaking Property set:
💩 = ID = breaks allowed before and after the character
⚠️ = AL = no breaks allowed in between pairs.
2b. (Bonus question) Can I somehow force emojis with the AL-property to split in between pairs?
Yes you can! From the spec:
Use ZWSP as a manual override to provide break opportunities around alphabetic or symbol characters.
When looking that one up in the list, we can see that is has ZW set as a value for its Line Breaking Property. The spec mentions that ZW applies the (A) class, thus allowing breaks after it 😊
Manually sprinkle a ZWSP character in between pairs of AL-emoijs to provide break opportunities.
Using CSS you can use word-break: break-all; to allow breaks be inserted between any two characters.
3. Why do emojis get text-wrapped, and not hyphenated?
Browsers use a language-based hyphenation dictionary to apply hyphenation. The dictionary to use is defined by the set language on a document or element. From the CSS Text Module Level 3 Spec:
Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The UA is therefore only required to automatically hyphenate text for which the author has declared a language (e.g. via HTML lang) and for which it has an appropriate hyphenation resource.
(*) As seen in the tests it should be noted that firefox follows this guideline quite strictly, as hyphenation only works when explicitly setting the lang attribute to a one of its supported languages (see tests). Chrome and Safari, apparently, are not so strict in this and use a default language. There’s a bug filed for Chrome on this.
There is no hyphenation dictionary for emojis.
Emojis don’t hyphenate because they have no hyphenation-dictionary. For regular text, be sure to set your lang attribute if you want hyphenation to work properly. Also: IE/Edge don’t like my tests, apparently.
Now, that was an interesting journey I must say. I now understand why certain things (should) happen. 🙂
Remains – as per usual – a few browser-specific quirks to be answered …
Why, in Chrome, is hyphenation not applied on the last string? → It’s a bug! and has been fixed in Chrome 56 and up.
Why, in Firefox, does the container stretch out on the ⚠️-test, whilst other browsers overflow? → …
Why, in IE11/Edge, does the 💩-test not wrap, whilst other browsers do? → …
Why, in IE11/Edge, does hyphenation not work even though it should? Setting the lang to en-US, or setting the lang on the body or html yield the same result. → …
Why, in Chrome and Safari, is hyphenation applied even when the required lang is not set? → Bugs for Chrome and Safari have been filed.
Did this help you out? Like what you see? Consider donating.
I don’t run ads on my blog nor do I do this for profit. A donation however would always put a smile on my face though. Thanks!