[3.1] convert 4-bytes Unicodes to relatives 3-bytes

Discussion forum for Extension Writers regarding Extension Development.
Post Reply
User avatar
3Di
Former Team Member
Posts: 13921
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milan (IT) Frankfurt (DE)
Name: Marco
Contact:

[3.1] convert 4-bytes Unicodes to relatives 3-bytes

Post by 3Di » Wed May 10, 2017 12:51 am

Hi all,
while I am writing an extension for 3.1.x I encountered an obstacle, to make the story short..
I got an array

Code: Select all

$data = array(
	'personaname'	=>	(string) $group_member['personaname'],
	);
and I am updating data in the TABLE within a foreach.

One of those funny users has some Emoji into its username, example:
JonDoe💨💨
('\xF0\x9F\x92\xA8\xF0\x9F...' ).

I am aware of the Mysql issues etc. but I just want to strip or convert those Emojis into the relative UTF8 3-bytes representation or a replacement char, which seems to be the better solution to avoid Unicode XSS issues.

phpBB 3.1 doesn't support emojis thus no need to get crazy.

The error tossed is as follows

Code: Select all

// Incorrect string value: '\xF0\x9F\x92\xA8\xF0\x9F...' for column 'personaname' at row 211 [1366]
That column is a varchar(255), but as I said, I am not interested about the SQL, I would like to deal with this in PHP. A regex of some sort, I mean, will do.

TIA :)
Please PM me only to request paid works. Thx.
Want to compensate me for my interest? Donate
My development's activity º PhpStorm's proud user
Extensions, Scripts, MOD porting, Update/Upgrades
👨‍🏫 | Take a tour to | The Studio | 👨‍🏫

User avatar
3Di
Former Team Member
Posts: 13921
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milan (IT) Frankfurt (DE)
Name: Marco
Contact:

Re: [3.1] convert 4-bytes Unicodes to relatives 3-bytes

Post by 3Di » Wed May 10, 2017 2:43 am

Seems I found my way, lurking at stack_overflow I found something for C, I did some experiments..

Code: Select all

'personaname'		=> ( (string) preg_replace('/[\x{10000}-\x{10FFFF}]/u', '�', $group_member['personaname']) ),
It works, the replacement Char it is hard-coded, sleepy time has come though.

Edit: here's the right replacement.

Code: Select all

'personaname'	=> (string) ( preg_replace('/[\x{10000}-\x{10FFFF}]/u', '\xef\xbf\xbd', $group_member['personaname']) ),
Thoughts?
Please PM me only to request paid works. Thx.
Want to compensate me for my interest? Donate
My development's activity º PhpStorm's proud user
Extensions, Scripts, MOD porting, Update/Upgrades
👨‍🏫 | Take a tour to | The Studio | 👨‍🏫

User avatar
3Di
Former Team Member
Posts: 13921
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milan (IT) Frankfurt (DE)
Name: Marco
Contact:

Re: [3.1] convert 4-bytes Unicodes to relatives 3-bytes

Post by 3Di » Sat May 27, 2017 9:24 pm

3Di wrote:
Wed May 10, 2017 2:43 am
Edit: here's the right replacement.

Code: Select all

'personaname'	=> (string) ( preg_replace('/[\x{10000}-\x{10FFFF}]/u', '\xef\xbf\xbd', $group_member['personaname']) ),
Errn.. (for the posterity), notice the double-quotes.

Code: Select all

'personaname'	=> (string) ( preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xef\xbf\xbd", $group_member['personaname']) )
.. and never trust xml input from Web APIs ;) always use

Code: Select all

/**
 * We can not be sure that the xml is encoded in UTF-8
 * or contains bad characters. Lesson taken. ;)
 */
if ( function_exists('iconv') )
{
	$xml_input = @iconv('UTF-8', 'UTF-8//IGNORE', $xml_input);
}
else
{
	//your magic here
}
before of that.
Or simply deny the ext's installation within ext.php in case, using the appropriated code.

Tested. :)
Please PM me only to request paid works. Thx.
Want to compensate me for my interest? Donate
My development's activity º PhpStorm's proud user
Extensions, Scripts, MOD porting, Update/Upgrades
👨‍🏫 | Take a tour to | The Studio | 👨‍🏫

Post Reply

Return to “Extension Writers Discussion”