Unicode "unknown character" squares randomly appear

Get help with installation and running phpBB 3.0.x here. Please do not post bug reports, feature requests, or MOD-related questions here.
Ideas Centre
Forum rules
END OF SUPPORT: 1 January 2017 (announcement)
User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Mon Jun 30, 2008 2:00 pm

Hi,
I'm having some troubles with my phpBB3 board.

Sometimes, when posting messages with special characters (like accents), the mysteryous "squares with question-mark for unicode unknown characters" appear.
The funny thing is that there seem to be no common pattern in such events: dates, times, users, tags used, message contents, server load, moon phases, nothing that I could think of.
Sometimes it happens to myself, sometimes to users; it is generally sufficient to edit the message, correct it and resubmit (identical to the original) to fix it; rarely you need to edit it twice.

The only common pattern is the position of the squares, when the problem happens:
- 1st accented letter: no issue;
- 2nd accented letter: square immediately after it;
- 3rd accented letter: square 2 characters after it;
- ...
- Nth accented letter: square N-1 characters after it.
Each square replaces two (sometimes three? not sure) other characters. The special characters themselves show correctly.

I found a couple of probably related topics: http://www.phpbb.com/community/viewtopi ... 6&t=579825 and http://www.phpbb.com/community/viewtopi ... 6&t=559587, but neither has a resolutive answer.

Additional info:
- PHP 5.2.5;
- MySQL 5.0.51a (no MySQLi extension);
- tables are utf8_bin, MyISAM engine;
- posts table is about 25MB (around 10K posts);
- board is a phpBB2 upgraded;
- issue has been present since I use phpBB3 (earlier 3.0.RC1, now 3.0.0), never ceased and never got worse;
- if you need other info, just ask.

Thanks for your attention.
Whatever I say, it's not my fault.

Dundurs
Registered User
Posts: 14
Joined: Wed Jan 05, 2005 9:40 am

Re: Unicode "unknown character" squares randomly appear

Post by Dundurs » Sun Jul 06, 2008 10:09 pm

I also have this boring problem. Sometimes posting from IE7 gives no errors, FF gives this error many times on day - not always. After month things changed IE7 gives errors FF not. The problem is old - from first RC nothing has changed. I heard the same problems for French, German and now Italian. For me it's Latvian. The only solution was to change PHP to version prior 5.2, but I have no hands on it. The host tried it for me for one month, but than changed back to 5.2.5 and my problems came back. There also hasn't been any staff answer for this.

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Mon Jul 07, 2008 12:48 pm

Thanks for bringing the issue back up.
Dundurs wrote:The only solution was to change PHP to version prior 5.2, but I have no hands on it. The host tried it for me for one month, but than changed back to 5.2.5 and my problems came back.
Humm... how sure are we that PHP 5.2 is the issue?
Whatever I say, it's not my fault.

User avatar
Eelke
QA Team
Posts: 2903
Joined: Thu Dec 20, 2001 8:00 am
Location: NL, Bussum
Name: Eelke Blok
Contact:

Re: Unicode "unknown character" squares randomly appear

Post by Eelke » Mon Jul 07, 2008 1:13 pm

We need to clear up where this problem is originating. You state that it only happens when posting with a certain browser and that you can correct the problem by posting again. This would suggest that the problem originates in the way the browser sends the data. How do the problematic posts look from another browser? What is the encoding set to in the browser when this problem occurs (e.g. "View > Character encoding").

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Mon Jul 07, 2008 6:00 pm

Eelke wrote:You state that it only happens when posting with a certain browser and that you can correct the problem by posting again. This would suggest that the problem originates in the way the browser sends the data. How do the problematic posts look from another browser? What is the encoding set to in the browser when this problem occurs (e.g. "View > Character encoding").
I didn't state that. I need to ask my users who experienced the problem, but I'm quite sure some of them uses IE (while I don't).
I didn't check the encoding, I should check every time (while the problem happens rarely), but what would trigger a char encoding change?
Honestly it looks more like a server-side issue, I might be wrong of course.
I'm gonna ask for browser and O/S too, stay tuned.
Whatever I say, it's not my fault.

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Wed Jul 09, 2008 10:10 pm

Ok, one user for example uses IE7 on Vista. I use Firefox on XP. It definitely looks like a server-side issue, either PHP or MySQL, I don't think phpBB is to blame but the issue should be investigated.
What other info would be useful?
Whatever I say, it's not my fault.

SamG
Former Team Member
Posts: 3221
Joined: Fri Aug 31, 2001 6:35 pm
Location: Beautiful Northwest Lower Michigan
Name: Sam Graf

Re: Unicode "unknown character" squares randomly appear

Post by SamG » Thu Jul 10, 2008 12:20 am

I apologize if you already covered this and I missed it or I'm just misunderstanding: The post is initially entered and stored this way? And then edited to correct the problem? If so, this begins client side, doesn't it? Within the context of the form?

If so, then the question would seem to be, what can randomly influence the page encoding (which is part of what I think Eelke is driving at)? But to know if the encoding as delivered is being altered, you'd want to see what the browser reports the page encoding to be when the problem occurs, I think.
We should talk less, and say more.

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Thu Jul 10, 2008 12:37 am

SamG wrote:The post is initially entered and stored this way?
It is entered correctly and stored in the db the wrong way.
If so, this begins client side, doesn't it?
Well, not necessarily, I think.
But to know if the encoding as delivered is being altered, you'd want to see what the browser reports the page encoding to be when the problem occurs, I think.
Hu-hum. So, if I post something, and the problem occurs, I should go back and look into Page info > Encoding? I can do that.
Whatever I say, it's not my fault.

SamG
Former Team Member
Posts: 3221
Joined: Fri Aug 31, 2001 6:35 pm
Location: Beautiful Northwest Lower Michigan
Name: Sam Graf

Re: Unicode "unknown character" squares randomly appear

Post by SamG » Thu Jul 10, 2008 12:47 am

OK, I get it now. Sort of. So it's entered correctly, and then these unknown characters are inserted into the post after submission. In the database the post is wrong, but it's entered correctly. I guess it might still be a page encoding issue...

But supposing it's not, we'd like to know what a server could be doing occassionaly that would alter the post -- basically inserting characters, though. That's the part that's most confusing to me, regardless of whether it's server or client side. I wonder if it's always the same character that gets inserted...
We should talk less, and say more.

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Thu Jul 10, 2008 12:31 pm

I'm confused too. :)

It is not always the same character -- or better, any "special" character (multibyte character in UTF8) can trigger the issue; the point is that it doesn't seem to be strictly content-related, since editing the messed up message back to the one originally entered generally solves the issue... same identical content sent by the browser (a part from possible encodings changes, which themselves should be triggered by something else -- remember, this behavior is browser-independent), different results in the db.
It looks like it's something time-related: some moments it just doesn't work, a little later it does.

I can report a few examples of messages which showed this problem, if this can help, but I can't find any common pattern, except the one I mentioned in post #1.
The fact that special characters trigger the issue but are not affected (i.e. they show correctly) is what suprises me the most. It looks like there's some byte scrambling or mixing happening somewhere.

So, hum, what PHP functions does phpBB use to prepare posts for insertion? Are they multibyte-safe?
(Still, how wouldn't this be not content-dependent?)

I'm looking into PHP bug tracker to see if I can spot anything.
Whatever I say, it's not my fault.

User avatar
Nicholas the Italian
Registered User
Posts: 170
Joined: Tue Nov 21, 2006 5:18 pm

Re: Unicode "unknown character" squares randomly appear

Post by Nicholas the Italian » Thu Jul 10, 2008 1:11 pm

Not sure whether any of these may be related:
http://bugs.php.net/bug.php?id=45311
http://bugs.php.net/bug.php?id=44868
http://bugs.php.net/bug.php?id=37661
http://bugs.php.net/bug.php?id=38926 (that's the only one that looks like might be time-dependent)
http://bugs.php.net/bug.php?id=39279
http://bugs.php.net/bug.php?id=43840
http://bugs.php.net/bug.php?id=43841
http://bugs.php.net/bug.php?id=44617

I use PHP 5.2.5.
Forum search engine is fulltext MySQL

From phpinfo():
- Multibyte string engine: libmbfl
- Multibyte regex (oniguruma) version: 4.4.4
- mbstring.detect_order: no value
- mbstring.encoding_translation: Off
- mbstring.func_overload: 0
- mbstring.http_input: pass
- mbstring.http_output: pass
- mbstring.internal_encoding: no value
- mbstring.language: neutral
- mbstring.script_encoding: no value
- mbstring.strict_detection: Off
- mbstring.substitute_character: no value
- among configure commands there are '--enable-mbstring' and '--enable-zend-multibyte'
Whatever I say, it's not my fault.

Dundurs
Registered User
Posts: 14
Joined: Wed Jan 05, 2005 9:40 am

Re: Unicode "unknown character" squares randomly appear

Post by Dundurs » Mon Jul 14, 2008 11:25 pm

I do corrections on most messages through all day. For one if I can understand meaning of remained text it's enough to click "edit" correct text and send it again. For others I have to to do this more than 5 times and finally text is correctly stored. But there I can't do anything meaning is completely lost: http://www.uscars.lv/nt_Diskusijas/view ... =13&t=1836 (Latvian)


On the next message meaning can be read and it says: When this boring shit will come to end?!


I feel so tired... :(

Ask me for any information to help staff solve this as fast as possible...

Dundurs
Registered User
Posts: 14
Joined: Wed Jan 05, 2005 9:40 am

Re: Unicode "unknown character" squares randomly appear

Post by Dundurs » Mon Jul 14, 2008 11:32 pm

Just for test here:

īdzīgi glāžšķūņu rūķīši

and there it works ...

User avatar
Eelke
QA Team
Posts: 2903
Joined: Thu Dec 20, 2001 8:00 am
Location: NL, Bussum
Name: Eelke Blok
Contact:

Re: Unicode "unknown character" squares randomly appear

Post by Eelke » Tue Jul 15, 2008 6:12 am

Someone should check the contents of their database. Preferably, also the HTTP traffic with an HTTP tracer such as YATT. It needs to be determined where the error is being introduced.

mkruer
Registered User
Posts: 74
Joined: Mon Apr 28, 2003 7:49 pm

Re: Unicode "unknown character" squares randomly appear

Post by mkruer » Tue Jul 15, 2008 11:51 pm

Ever since I have updated the site form 3.0.1 to 3.0.2 I am getting the same exact issue. it was working fine. I even went though the hassle and replaced all the files from the full branch, and still nada luck getting it to work.

“ = “
““ = ““�
““� = ““���
““��� = ““������
etc...

After the second ““ soon as I hit preview, it stars this process. Something is being added. looking at the update, I am wondering if something in the confusable is wrong?

BTW I tried it in FF3 and IE7 same thing

Locked

Return to “[3.0.x] Support Forum”