Sure, the fully assembled and parsed web page gets delivered as utf-8, but the original PHP files are still in ASCII format, which forces the developers to make use of character codes to bring in special characters, such as bullets: • and copyright symbols: ©.
I have taken the time to convert my entire copy of PHPBB 3.0 to “pure” utf-8 (without BOM), where all of the individual PHP files are also utf-8 instead of ASCII. This gave me three very big bonuses:
- I am able to make use of any character in the utf-8 character set, whether or not it actually has a corresponding character code.
- I am able to serve it up as XHTML 1.1. That version of XHTML can only accept 4 character codes -- the non-breaking space ( ), the two angle brackets (> <) and the & character (&) -- so by being able to use special characters directly (as opposed to using character codes), I am able to make use of all special characters while still conforming to the XHTML 1.1 spec.
- I am able to serve up the site as PROPER application/xhtml+xml to all modern web browsers (as is REQUIRED when serving it up as XHTML), while still serving it up as text/html to Internet Explorer. Hey, no-one is perfect, much less the 70% of sheeple out there still using that sorry excuse of a web browser… using application/xml is still acceptable, but incurs a massive page load penalty under IE which I am not interested in having my visitors experience, hence the spoon-feeding of text/html to only users of IE.
So… I’m hoping a team lead will stumble across this and answer my question: when the benefits of going to pure utf-8 are so great, why are the individual PHP files still encoded in ASCII??
Granted, I can understand why things are still served up as text/html… not everyone can handle the problems that can crop up when things go wonky under application/xhtml+xml. But why drop the ball on the entire utf-8 issue? That seems like a rather massive (and amateurish) gaffe.