Why does PHPBB3 make use of “FAKE” utf-8??

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Anti-Spam Guide
rekabis
Registered User
Posts: 8
Joined: Sat Nov 10, 2007 7:50 am
Location: Kelowna, British Columbia, Canada
Contact:

Why does PHPBB3 make use of “FAKE” utf-8??

Post by rekabis » Thu May 08, 2008 4:46 am

I just wanted to know: Why does PHPBB 3.0 make use of “fake” utf-8?

Sure, the fully assembled and parsed web page gets delivered as utf-8, but the original PHP files are still in ASCII format, which forces the developers to make use of character codes to bring in special characters, such as bullets: • and copyright symbols: ©.

I have taken the time to convert my entire copy of PHPBB 3.0 to “pure” utf-8 (without BOM), where all of the individual PHP files are also utf-8 instead of ASCII. This gave me three very big bonuses:
  • I am able to make use of any character in the utf-8 character set, whether or not it actually has a corresponding character code.
  • I am able to serve it up as XHTML 1.1. That version of XHTML can only accept 4 character codes -- the non-breaking space (&nbsp;), the two angle brackets (> <) and the & character (&) -- so by being able to use special characters directly (as opposed to using character codes), I am able to make use of all special characters while still conforming to the XHTML 1.1 spec.
  • I am able to serve up the site as PROPER application/xhtml+xml to all modern web browsers (as is REQUIRED when serving it up as XHTML), while still serving it up as text/html to Internet Explorer. Hey, no-one is perfect, much less the 70% of sheeple out there still using that sorry excuse of a web browser… using application/xml is still acceptable, but incurs a massive page load penalty under IE which I am not interested in having my visitors experience, hence the spoon-feeding of text/html to only users of IE.
Granted, I had to break out my copy of Notepad++ in order to save the PHP files without a BOM (MS Notepad repeatedly kept a BOM in a few files for some strange reason), but the simplicity of this act makes me seriously wonder why the developers simply didn’t do this one very easy action early in the development process. I have opened and re-saved the PHP files up on Dreamweaver CS3 as well as a few other editors, and they all continue to exist as properly BOM-less utf-8, so the danger of these files “reverting” away from BOM-less utf-8 is minimal at best.

So… I’m hoping a team lead will stumble across this and answer my question: when the benefits of going to pure utf-8 are so great, why are the individual PHP files still encoded in ASCII??

Granted, I can understand why things are still served up as text/html… not everyone can handle the problems that can crop up when things go wonky under application/xhtml+xml. But why drop the ball on the entire utf-8 issue? That seems like a rather massive (and amateurish) gaffe.

User avatar
igorw
Former Team Member
Posts: 8024
Joined: Fri Dec 16, 2005 12:23 pm
Location: {postrow.POSTER_FROM}
Name: Igor Wiedler

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by igorw » Thu May 08, 2008 4:21 pm

What you have done is a waste of time. Sorry to say. phpBB3 does not make use of "fake" utf-8 (if something like that exists).

Most of phpBB3's files are ASCII encoded because it simply doesn't matter. The data is stored in the database, in utf-8 encoding. The only other source for data that is displayed (which would require UTF-8), are the language files. And these in fact are using utf-8 encoding.

PHP is a serverside language, it doesn't care about the file encoding (only if the file itself contains special characters, like language files). The client does not care about what encoding the files on your server are. The only thing it cares about is what it gets. And as you said, it is delivered in UTF-8. So where's the problem? Everything is perfectly fine.

As for the content type. If you look in the page_header() function of includes/functions.php, you will see the reason. Because of IE6. It simply doesn't support the XML content type, so it's delivered as text/html. The development team decided not to go through the hassle of checking the users browser (allthough they did in fact go through that hassle for the download system).

Conclusion: phpBB3 already does fully support UTF-8, and does not use "fake" utf-8, because that doesn't even exist.


Oh, and did you just call the phpBB development team amateurs? :o
Igor Wiedler | area51 | GitHub | trashbin | Formerly known as evil less than three

User avatar
Highway of Life
Former Team Member
Posts: 6048
Joined: Wed Feb 02, 2005 5:41 pm
Location: Seattle, WA
Name: David Lewis
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by Highway of Life » Thu May 08, 2008 7:12 pm

Wow, indeed I must agree with eviL<3, you did waste quite a bit of time.

If you look at the phpBB3 Coding Guidelines, you will see where it explicitly states that Language files must be saved as UTF-8 without a BOM. Because so many languages require UTF-8, these language files would not be able to easily operate without being UTF-8 encoded.
It also enables us to use right single quote instead of single quote, among others: » “ ” and …
Crack open /language/en/common.php to see what I’m talking about. Its right up there in the comments in the "DEVELOPERS PLEASE NOTE" section.

Also, you cannot actually deliver your pages in application/xhtml+xml unless your browser requests that the page be delivered in that content type rather than text/html -- aside from IE throwing fits.
And you can use whatever version of XHTML you wish, subsilver2 uses XHTML 1.0 Transitional, prosilver uses XHTML 1.0 Strict. It is really just dependant on your style and has nothing to do with phpBB3 being UTF-8 or not.

And furthermore, there is no such thing as "fake UTF-8"
The phpBB Weekly Podcast - Discussing the developments of phpBB4 and beyond.

New to phpBB3? Want to learn about programing?
Visit phpBB Academy at StarTrekGuide to learn how.

User avatar
jimdunn
Registered User
Posts: 1570
Joined: Tue Mar 25, 2008 11:49 am
Location: Australia

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by jimdunn » Fri May 09, 2008 1:47 pm

Oh, damn...

I just opened every file on my website in Notepad++ and resaved it as utf-8 no bom

Surely you're not saying that was a waste of time ?

:twisted:

User avatar
jimdunn
Registered User
Posts: 1570
Joined: Tue Mar 25, 2008 11:49 am
Location: Australia

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by jimdunn » Fri May 09, 2008 1:53 pm

eviL<3 wrote:Oh, and did you just call the phpBB development team amateurs? :o
Obviously you must be - or you'd be in gainful employment instead of messing around here...

:lol:

ToonArmy
Former Team Member
Posts: 4608
Joined: Sat Mar 06, 2004 5:29 pm
Location: Worcestershire, UK
Name: Chris Smith
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by ToonArmy » Fri May 09, 2008 3:44 pm

jimdunn wrote:
eviL<3 wrote:Oh, and did you just call the phpBB development team amateurs? :o
Obviously you must be - or you'd be in gainful employment instead of messing around here...

:lol:
I seriously hope that was a joke, many of the phpBB developers are employed full time and work on phpBB in their spare time like some people work for charities, do up a classic car, etc.
Chris SmithGitHub

User avatar
Raimon
Former Team Member
Posts: 12088
Joined: Tue May 30, 2006 5:31 pm
Location: Netherlands
Name: Raimon Meuldijk
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by Raimon » Fri May 09, 2008 8:13 pm

Highway of Life wrote:Wow, indeed I must agree with eviL<3, you did waste quite a bit of time.

If you look at the phpBB3 Coding Guidelines, you will see where it explicitly states that Language files must be saved as UTF-8 without a BOM. Because so many languages require UTF-8, these language files would not be able to easily operate without being UTF-8 encoded.
Why are on that case official languages not save as UTF-8 , and instead of the use UTF8 the use clumsy HTML on those files?

Code: Select all

title="Board indëüz"
(source view)

With clumpse html

Code: Select all

title="Board indë&uuml;z"
(source view)

And a few (official) language packs use those clumsy HTML , on that way you get not right UTF-8
I'm a little worry about that, it sounds to me not a right name for phpBB if you defined that phpBB is UTF-8 , if some (official) language on a other translation beside the default language don't following this you can not claim that phpBB3 is utf-8 , yes with a default package , other language perhaps.
maby a better check of the language files will prevent this, i really don't know how the phpBB teams check the language files of other translations , but those things can you prevent easy.

And no the translations have nothing to do with phpBB3, only the people how make that translation can you blame, or advice to do it different ;)
So its not fair if you claimed that phpBB used fake UTF-8.

And no the phpBB devs are not amateurs, the are phpBB kings :mrgreen:
Need phpBB installation, extenstions, Styles or integrate phpBB with you website?
Contact me for fair prices and good service!

User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by drathbun » Fri May 09, 2008 9:11 pm

Friendly moderator note:

Please feel free to discuss the appropriateness of UTF-8 and its uses as I feel that it is a legitimate question.

Do not feel free to call the relative skill level of the phpBB dev team into question. As you can understand that is a quick route to a flame war / name calling as therefore is not a legitimate use of the discussion forum, thanks.

Carry on. :)
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image

User avatar
jimdunn
Registered User
Posts: 1570
Joined: Tue Mar 25, 2008 11:49 am
Location: Australia

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by jimdunn » Sat May 10, 2008 1:02 am

ToonArmy wrote:
jimdunn wrote:
eviL<3 wrote:Oh, and did you just call the phpBB development team amateurs? :o
Obviously you must be - or you'd be in gainful employment instead of messing around here...

:lol:
I seriously hope that was a joke, many of the phpBB developers are employed full time and work on phpBB in their spare time like some people work for charities, do up a classic car, etc.

:o
Well - I'm really shocked and sorry if you misinterpreted my post.

I thought it was obvious that it was a joke, leaning on sarcastic irony towards the original suggestion.
(A suggestion which, incidentally, I found offensive and outrageous on their behalf)

Obviously it wasn't obvious enough that it was a joke - so I apologise again to anyone who missed the point and thought I was being critical.

I assure you nothing was further from my mind.

User avatar
Jim_UK
Former Team Member
Posts: 18478
Joined: Tue Oct 12, 2004 5:36 pm
Location: Darwen N.West UK

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by Jim_UK » Sun May 11, 2008 6:49 pm

[off topic]
I read your post and understood that it was a joke.
The problem is that different nationalities have different senses of humour. I can watch American comedy shows all day long and find nothing to laugh at. I am sure some could say the same about British humour which tends to lean towards sarcasm. (hence I could see your comment as a joke)
What we need to be aware of on an international site like this is that we can without thinking really offend some folks unintentionally. We need to think before we press the Submit button.
[/off topic]

Jim
The truth is out there.
Unfortunately they will not let you anywhere near it!

SneakySimian
Registered User
Posts: 31
Joined: Fri Apr 11, 2008 12:31 am

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by SneakySimian » Sun May 11, 2008 7:16 pm

jimdunn wrote:
ToonArmy wrote:
jimdunn wrote:
eviL<3 wrote:Oh, and did you just call the phpBB development team amateurs? :o
Obviously you must be - or you'd be in gainful employment instead of messing around here...

:lol:
I seriously hope that was a joke, many of the phpBB developers are employed full time and work on phpBB in their spare time like some people work for charities, do up a classic car, etc.

:o
Well - I'm really shocked and sorry if you misinterpreted my post.

I thought it was obvious that it was a joke, leaning on sarcastic irony towards the original suggestion.
(A suggestion which, incidentally, I found offensive and outrageous on their behalf)

Obviously it wasn't obvious enough that it was a joke - so I apologise again to anyone who missed the point and thought I was being critical.

I assure you nothing was further from my mind.
[offtopic]I read your post as tongue-in-cheek, which apparently I was right in doing. No worries. :)[/offtopic]

User avatar
Highway of Life
Former Team Member
Posts: 6048
Joined: Wed Feb 02, 2005 5:41 pm
Location: Seattle, WA
Name: David Lewis
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by Highway of Life » Sun May 11, 2008 10:44 pm

Jim_UK wrote:[off topic]
I read your post and understood that it was a joke.
The problem is that different nationalities have different senses of humour. I can watch American comedy shows all day long and find nothing to laugh at. I am sure some could say the same about British humour which tends to lean towards sarcasm. (hence I could see your comment as a joke)
What we need to be aware of on an international site like this is that we can without thinking really offend some folks unintentionally. We need to think before we press the Submit button.
[/off topic]

Jim
[off topic]I can watch US comedy shows all day and find nothing to laugh at as well. ;) ... could be because I like British humour better. ;) [/off topic]
The phpBB Weekly Podcast - Discussing the developments of phpBB4 and beyond.

New to phpBB3? Want to learn about programing?
Visit phpBB Academy at StarTrekGuide to learn how.

ibrothers
Registered User
Posts: 15
Joined: Mon May 12, 2008 8:29 am

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by ibrothers » Tue May 13, 2008 5:42 am

rekabis wrote: So… I’m hoping a team lead will stumble across this and answer my question: when the benefits of going to pure utf-8 are so great, why are the individual PHP files still encoded in ASCII??

Granted, I can understand why things are still served up as text/html… not everyone can handle the problems that can crop up when things go wonky under application/xhtml+xml. But why drop the ball on the entire utf-8 issue? That seems like a rather massive (and amateurish) gaffe.
Well Dear Rekabis!

This has nothing to do with phpbb!

when making your database in the first place you should make sure the database is UTF-8
then when creating tables they all become UTF-8.
Your problem was because in the first place you did not make the db utf-8

Now I have the same problem with one of my DB I made similar mistake
Can you please explain How I can make my whole DB utf-8 now?

thanks

User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by drathbun » Tue May 13, 2008 5:08 pm

ibrothers wrote:Can you please explain How I can make my whole DB utf-8 now?
This topic was about the source code format, not the database format. If you have an issue with your specific board and are looking for assistance, please post in the Support forum, thanks.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image

User avatar
naderman
Consultant
Consultant
Posts: 3735
Joined: Fri Aug 01, 2003 10:06 pm
Location: Berlin, Germany
Name: Nils Adermann
Contact:

Re: Why does PHPBB3 make use of “FAKE” utf-8??

Post by naderman » Sat May 17, 2008 9:15 pm

I'm not sure anyone pointed this out: ASCII is a subset of UTF-8. So if the files are ASCII encoded, they are also UTF-8 encoded. "&uuml;" is UTF-8 just as much as "ü" is UTF-8. Just that "&uuml;" is without good reason encoded as an entity which is encoded in UTF-8 instead of using UTF-8 encoding only.
I appreciate gifts from my Amazon wishlist.
naderman.de twitter: @naderman

Post Reply

Return to “phpBB Discussion”