Cannot insert 4-byte-UTF Chars into the database

Need some custom code changes to the phpBB core simple enough that you feel doesn't require an extension? Then post your request here so that community members can provide some assistance.

NOTE: NO OFFICIAL SUPPORT IS PROVIDED IN THIS SUB-FORUM
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

NOTE: NO OFFICIAL SUPPORT IS PROVIDED IN THIS SUB-FORUM
Post Reply
MitjaStachowiak
Registered User
Posts: 21
Joined: Mon Dec 30, 2019 7:29 pm
Location: Reinheim, Germany
Contact:

Cannot insert 4-byte-UTF Chars into the database

Post by MitjaStachowiak »

Hello,
in my extensions I'm using a code like

Code: Select all

// $this->db is \phpbb\db\driver\driver_interface
$this->db->sql_query("UPDATE tablename SET column = '".$this->db->sql_escape($this->request->variable('utfText', '', true))."' WHERE ...");
When I try to store a char > 0xFFFF such like πŸ™‚, I get a database error Incorrect string value: '\xF0\x9F\x99\x82' for column.

In any text input of phpBB I can use this emoji, so my database is principally able to store it.

What am I doing wrong?
User avatar
AbaddonOrmuz
Recognised Extension Developer
Posts: 1015
Joined: Wed Dec 25, 2013 9:06 pm
Location: /dev/null
Name: Alfredo
Contact:

Re: Cannot insert 4-byte-UTF Chars into the database

Post by AbaddonOrmuz »

You will need to change the table encoding to utf8mb4 to store 4 bytes per code point, or use the phpBB function utf8_encode_ucr() to convert them to hexadecimal notation.

Something like:

Code: Select all

$data = [
	'column' => utf8_encode_ucr($this->request->variable('utfText', '', true));
];

$sql = 'UPDATE tablename
	SET ' . $this->db->sql_build_array('UPDATE', $data) . '
	WHERE ...';

$this->db->sql_query($sql);
I haven't used utf8_encode_ucr() myself, since I'm using PostgreSQL, where UTF-8 really uses 4 bytes.
Some of my phpBB extensions:
Image Imgur | :chart_with_upwards_trend: SEO Metadata | Image Markdown | :lock: Auto-lock Topics
:trophy: Check out all my validated extensions :trophy:

:penguin: Arch Linux user :penguin:
MitjaStachowiak
Registered User
Posts: 21
Joined: Mon Dec 30, 2019 7:29 pm
Location: Reinheim, Germany
Contact:

Re: Cannot insert 4-byte-UTF Chars into the database

Post by MitjaStachowiak »

utf8_encode_ucr works, thanks.

But what would be the correct workflow for extensions? When I create the tables by migrations, they have utf8_bin encoding and I import ICS-data in a cron-task. If the ICS contains 4-byte-chars, an error can occur, even for users, who didn't paste emojis them self. So this import has to work out-of-the-box, without the requirement for users to modify the database...
User avatar
JoshyPHP
Code Contributor
Posts: 1177
Joined: Mon Jul 11, 2011 12:28 am

Re: Cannot insert 4-byte-UTF Chars into the database

Post by JoshyPHP »

utf8_encode_ucr is fine to be used on HTML but I wouldn't use it on any other kind of content.

IMO the only correct way is to have the database use a proper encoding.
I wrote the thing that does BBCodes in 3.2+.
MitjaStachowiak
Registered User
Posts: 21
Joined: Mon Dec 30, 2019 7:29 pm
Location: Reinheim, Germany
Contact:

Re: Cannot insert 4-byte-UTF Chars into the database

Post by MitjaStachowiak »

But is there a way to set the correct utf8mb4 encoding via migrations?

For my ICS-Calendar, HTML-escaped chars are not correct. Is there a reverse-Function to utf8_encode_ucr, that only converts the > 0xFFFF-Chars back to UTF8?
User avatar
JoshyPHP
Code Contributor
Posts: 1177
Joined: Mon Jul 11, 2011 12:28 am

Re: Cannot insert 4-byte-UTF Chars into the database

Post by JoshyPHP »

The short answer is "I don't think so" and no. Convert your database manually and keep text as text.
I wrote the thing that does BBCodes in 3.2+.
User avatar
AmigoJack
Registered User
Posts: 5795
Joined: Tue Jun 15, 2010 11:33 am
Location: γ‚°γƒͺーン ヒル ゾーン
Contact:

Re: Cannot insert 4-byte-UTF Chars into the database

Post by AmigoJack »

JoshyPHP wrote: ↑
Sat Aug 29, 2020 1:07 pm
keep text as text
The advice is good, but phpBB doesn't follow that itself: so far emojis are stored as HTML entities, and not as the characters they are. One downside of this is the search won't find my post with πŸ•πŸ’£πŸ’©πŸ¦„ in it.
  • The worst thing about censorship is β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
  • "The problem is probably not my English but you do not want to understand correctly. ... We will not come anybody anyway, nevertheless, it's best to shit this." Affin, 2018-11-20 ↑
  • "But this shit is not here for you. You can follow with your. Maybe the question, instead, was for you, who know, so you shoved us how you are." axe70, 2020-10-10 ↑
Post Reply

Return to β€œphpBB Custom Coding”