UTF-8 confusables are confusing (circular replacement)

Get help with installation and running phpBB 3.2.x here. Please do not post bug reports, feature requests, or extension related questions here.
Post Reply
dbj
Registered User
Posts: 11
Joined: Mon Oct 09, 2017 10:08 am

UTF-8 confusables are confusing (circular replacement)

Post by dbj » Mon Oct 23, 2017 11:23 pm

While working with phpBB I noticed the function utf8_clean_string and what it is supposed to do, i.e. avoid usernames that look similar or even identical but are different (by using "look alike characters")

https://github.com/phpbb/phpbb/blob/mas ... sables.php

Anyway, first, there is this replacement:

Code: Select all

'ǃ'=>'!'
Later, there is this replacement:

Code: Select all

'!'=>'ǃ'
One of them is an exclamation mark, the other is LATIN LETTER RETROFLEX CLICK. Depending on the font, you may see a difference between the two or not.

Other characters don't have this problem, for example all kind of question marks are replaced by the character ʔ (LATIN LETTER GLOTTAL STOP).

But what's the point of the replacement rules for the exclamation mark?
By using these rules, the usernames "testǃ" and "test!" are different usernames for phpBB.
But the usernames "test?" and "testʔ" are the same for phpBB and thus prohibited.
Last edited by Mick on Tue Oct 24, 2017 10:06 am, edited 1 time in total.
Reason: Moved from discussion.

User avatar
A_Jelly_Doughnut
Former Team Member
Posts: 34448
Joined: Sat Jan 18, 2003 1:26 am
Location: Where the Rivers Run
Contact:

Re: UTF-8 confusables are confusing (circular replacement)

Post by A_Jelly_Doughnut » Mon Oct 30, 2017 2:27 am

If your description is accurate (I did not investigate), I would say this is a bug that should be filed to the tracker: https://tracker.phpbb.com
A Donut's Blog
"Bach's Prelude (Cello Suite No. 1) is driving Indiana country roads in Autumn" - Ann Kish

User avatar
JoshyPHP
Code Contributor
Posts: 735
Joined: Mon Jul 11, 2011 12:28 am

Re: UTF-8 confusables are confusing (circular replacement)

Post by JoshyPHP » Mon Oct 30, 2017 4:15 am

According to the git log, that file is 10+ years old. Now that PHP 5.4 is the minimum required version it may be possible to leave the confusables detection to ext/intl's Spoofchecker. It doesn't work quite the same way though.

Code: Select all

var_dump((new Spoofchecker)->areConfusable('ǃ', '!'));
// bool(true)
I wrote the thing that does the BBCodes in 3.2. Unless it broke yours, in which case it was somebody else with a similar name.

dbj
Registered User
Posts: 11
Joined: Mon Oct 09, 2017 10:08 am

Re: UTF-8 confusables are confusing (circular replacement)

Post by dbj » Mon Oct 30, 2017 12:25 pm

Changing the utf8_clean_string function would be a breaking change of course, but I think it could be handled by an update script that reads the username, passes it through the new cleaning function and then updates the username_clean column in the database.

Edit: opened an issue in the bugtracker
https://tracker.phpbb.com/browse/PHPBB3-15427

Post Reply

Return to “[3.2.x] Support Forum”

Who is online

Users browsing this forum: Elias, HiFiKabin and 183 guests

cron