In “Let’s talk about usernames” James Bennett – author of django-registration
– digs deeper into an at first seemingly simple thing such as usernames and how to keep ‘m safe and unique.
And no, you can’t make it by just doing a a simple comparison. You’ll have to think of more than that if you want to do it good:
- Casing:
John_Doe
vs.JOHN_DOE
- Homographs:
а
(U+0430 CYRILLIC SMALL LETTER A
) vs.a
(U+0061 LATIN SMALL LETTER A
) - Reserved words (especially when used in e-mail addresses and (sub)domains): Think of
admin
andhostmaster
- Reserved words (especially when used in URLs): Think of
login
,register
, and evenkeybase.txt
- Confusables:
paypal
vs.paypa1
- Misspellings
- Non-ASCII Characters:
ç
vs.c
- …
To tackle the Non-ASCII Characters in JavaScript, Lea Verou recently tweeted this nice trick, using String.protototype.normalize
:
TIL you can convert letters to their non-accented version by just doing:
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "");Whoa. No more 1KB lookup tables!
Mind = blown!And it’s supported everywhere!https://t.co/fukRLsu3VV
— Lea Verou (@LeaVerou) November 26, 2017
'céçile'.normalize("NFD").replace(/[\u0300-\u036f]/g, "");
// ~> cecile