[2.0.11] Page Encoding

The cleanup is complete. This forum is now read only.

Rating:

Excellent!
5
36%
Very Good
1
7%
Good
3
21%
Fair
1
7%
Poor
4
29%
 
Total votes: 14

Extensions Robot
Extensions Robot
Extensions Robot
Posts: 27988
Joined: Sat Aug 16, 2003 7:36 am

[2.0.11] Page Encoding

Post by Extensions Robot » Fri Dec 10, 2004 4:50 pm

MOD Name: Page Encoding
Author: billytcf
MOD Description: Change the default page encoding of phpBB2 to utf-8


MOD Version: 1.0.0

Download File: Default Encoding.mod
mods overview page: View
File Size: 1879 Bytes

Security Score: 0
Last edited by Extensions Robot on Mon Apr 30, 2007 12:28 am, edited 1 time in total.
(this is a non-active account manager for the phpBB Extension Customisations Team)

wGEric
Former Team Member
Posts: 8805
Joined: Sun Oct 13, 2002 3:01 am
Location: Friday
Name: Eric Faerber
Contact:

Post by wGEric » Sun Dec 12, 2004 10:35 pm

MOD Validated/Released

Notes:
This MOD changes the page encoding of phpBB to UTF-8 instead of iso-8859-1, which is the default for phpBB, so that it works better when displaying Chinese characters
Eric

darakhshan
Registered User
Posts: 794
Joined: Fri Apr 30, 2004 7:18 pm

Post by darakhshan » Mon Dec 20, 2004 4:03 pm

I have a persian board right to left
phpbb version 8 when I change it the page does not show up and it is all blank

Dr. dre
Registered User
Posts: 7
Joined: Sat Jan 15, 2005 12:36 pm

Post by Dr. dre » Sun Jan 16, 2005 1:19 pm

So what will the file name will be?

sonyboy
Registered User
Posts: 2980
Joined: Thu Oct 07, 2004 2:10 am

Post by sonyboy » Sun Jan 16, 2005 2:07 pm

I just installed this MOD then went to my forum and the Encoding is not utf-8, is this because I'm running on localhost?

EDIT - Nevermind, I got it.

ScuL
Registered User
Posts: 111
Joined: Tue May 04, 2004 6:01 pm
Location: NZ

Post by ScuL » Fri Feb 18, 2005 1:09 pm

There are 2 issues that this mod doesn't solve and are most probably impossible to solve with a mod.

What it does is changing your encoding character set...

Now if your server happens to have a default encoding character set in it's configuration files (i.e. apache.conf ) that will overrule the set handed to the browser by this mod and stil display the page in ISO8xxx (or whatever it is set to).

Second: It does change the encoding of the page if this is default isn't set but the contents of the database were entered using the previous character set which means that some messages in accents (for instance ä ö å ü ÿ ç etc) will show as garble. Not to talk about languages as Arab or Japanese.

I strongly recommend all forum-admins to install this mod before you open a forum to the users.

If anybody knows how to convert database-contents to UTF-8 in a easy manner please let me know.

ian!
Registered User
Posts: 6
Joined: Thu Sep 02, 2004 10:37 am
Contact:

Post by ian! » Sat Feb 19, 2005 10:41 am

ScuL wrote: If anybody knows how to convert database-contents to UTF-8 in a easy manner please let me know.

I'm working on a script that does this. I'll make it public if it's tested well enough. If you'd like to test it too drop me a mail and I'll send it to you so that you can test it on a copy of your board too.

User avatar
billytcf
Registered User
Posts: 6
Joined: Sat Jan 10, 2004 4:03 am
Location: Hong Kong

Post by billytcf » Wed May 04, 2005 5:19 pm

thanks for you guys' replies.

Indeed I wrote this MOD originally for forums using Traditional Chinese Characters.

Yea I found the same problem as described by ScuL, indeed I install this before opening a forum.

Anyw my skills aren't that good can't solve many problems :?
~ Billy Tse ~

darakhshan
Registered User
Posts: 794
Joined: Fri Apr 30, 2004 7:18 pm

Post by darakhshan » Fri May 20, 2005 7:59 pm

I have the same problem, My board now is well known with a lot of posts
but I need to change my charset to UTF-8
I need to do that to my database, any body has any Idea how I can do it
and what is the risk?

Is this easy to do? can I reverse the problem without the need to backup and restore the data from the backup?

I am puzzeled :cry:

ScuL
Registered User
Posts: 111
Joined: Tue May 04, 2004 6:01 pm
Location: NZ

Post by ScuL » Sat May 21, 2005 10:52 am

what I did was using the mySQL replace command to replace all the old characters by unicode accepted characters.

this becomes a problem when you aren´t talking about european (western characters) but asian like chinese or japanese. they are very hard to convert.. unfortunately

darakhshan
Registered User
Posts: 794
Joined: Fri Apr 30, 2004 7:18 pm

Post by darakhshan » Sat May 21, 2005 12:27 pm

ScuL wrote: what I did was using the mySQL replace command to replace all the old characters by unicode accepted characters.

this becomes a problem when you aren´t talking about european (western characters) but asian like chinese or japanese. they are very hard to convert.. unfortunately


mySQL replace command ?

where can I get the tutorial please? have no Idea :?:

ScuL
Registered User
Posts: 111
Joined: Tue May 04, 2004 6:01 pm
Location: NZ

Post by ScuL » Sat May 21, 2005 12:34 pm

Here is more information on the command
http://dev.mysql.com/doc/mysql/en/replace.html

But it means you have to manually find the equals for unicode characters with a unicode-text converter, and then replace every character using a smart script :)

darakhshan
Registered User
Posts: 794
Joined: Fri Apr 30, 2004 7:18 pm

Post by darakhshan » Sat May 21, 2005 6:03 pm

there is a software called

emeditor
http://www.emeditor.com/

after downloading the database, I open it and then save it with utf-8
it seems allright to me but have no guts to upload the new one into my server.

Isn't this the solution?

pichirichi
Registered User
Posts: 83
Joined: Wed Jun 02, 2004 5:34 am
Contact:

IMPORTANT INFO: UTF-8 - not that simple.

Post by pichirichi » Thu Jun 09, 2005 9:39 am

I've just upgraded to UTF-8 after my hosting "forced" me, they've upgraded the MySql and shifted the entire data.

Here are some tips regarding utf-8/unicode:
  1. Edit all the lang files and save them as utf-8 (you can use UltraEdit for that, use the convert option).
  2. Edit all e-mailslocated in the lang directory, replace the charset and save them as utf-8 (you can use ultraedit here as well).

    Code: Select all

    Charset: utf-8
    There is an issue with the subject line (some charecters disappear, still looking for resolution).
  3. You need to convert the database data to utf-8.
    This can be done in few manners:
    1. lock the forum in the ACP for updates, export the data. convert the exported file to utf-8 (using an editor or a convert program). refresh the data.
    2. MySql commands - version 4.1.x come with several new features regarding charecter set handling.
      Columns COLLATE and CHARACTER should be defined.

      Code: Select all

      ALTER TABLE `test table` CHANGE `a` `a` VARCHAR( 10 ) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL
      This command will alter the data stored in the column, there is a batch command that can be used for this process.
  4. You got to use phpMyAdmin that support utf-8.
  5. after the connect to db command the following code should be added:

    Code: Select all

    //
    // Set charecters set parameters according to MySql version.
    //
    $result = mysql_query('SELECT VERSION()') or die('Query failed: ' . mysql_error());
    $mysql_version = mysql_fetch_array($result, MYSQL_ASSOC);
    list($mysql_version_majour,$mysql_version_minor,$mysql_version_patch) = split('\.',$mysql_version['VERSION()']);
    if (($mysql_version_majour >=4) && ($mysql_version_minor>=1))
    {
    	$result = mysql_query('SET character_set_client = utf8') or die('Query failed: ' . mysql_error());
    	$result = mysql_query('SET character_set_results = utf8') or die('Query failed: ' . mysql_error());
    	$result = mysql_query('SET character_set_connection = utf8') or die('Query failed: ' . mysql_error());
    }
    
  6. Special charecters: there are some charecters used by MOD developers that translate into gibrish when using utf-8.
    i.e. "»" that usualy used for won't be displayed correctly on utf-8, this sign should be replaced in tpl files or php code. It can be changed to

    Code: Select all

    »
  7. String manipulation such as substr command won't work with multi byte/utf-8.
    Cutting charecters depends on how many bytes they occupied, try to read on that php site in the comment area about this issue. You should define an encoding string based on the encoding you want to use. the best way would be to use the parameter defined in $lang['ENCODING'] however the value in this parameter is set in a later stage. In the extension.inc file define:

    Code: Select all

    mb_internal_encoding('UTF-8');
    Note: There are some updates regarding the return values from the mb lib functions.
    ie. mb_string() returns empty _string_, when function substr() returns _boolean_ false in this case.
  8. MB lib - there is a server parameter that provide the ability to automaticly override all the relevant string functions with MultiByte functions. Read on PhpNet about it.
    If you'll set the parameter mbstring.func_overload in the php.ini to 7 you won't need to change and replace all the string functions. For those who don't have permitions to update the php.ini you can override the value in the .htaccess file with this code line:

    Code: Select all

    php_value mbstring.func_overload 7
    Note: phpMyAdmin is not set to work with "mbstring.func_overload" and some errors occures if it is defined in global level (I've noticed some errors in inserting text strings), there for I recommend to use the .htaccess option for override.

    Code: Select all

    php_value mbstring.func_overload 7
    The string functions that the override covers automaticly are:
    • mail() => mb_send_mail()
    • strlen() => mb_strlen()
    • strpos() => mb_strpos()
    • strrpos() => mb_strrpos()
    • substr() => mb_substr()
    • strtolower() => mb_strtolower()
    • strtoupper() => mb_strtoupper()
    • substr_count() => mb_substr_count()
    • ereg() => mb_ereg()
    • eregi() => mb_eregi()
    • ereg_replace() => mb_ereg_replace()
    • eregi_replace() => mb_eregi_replace()
    • split() => mb_split()
    The following functions won't function with multi byte:
    • preg_replace() should e replaced with mb_eregi_replace().
    • htmlspecialchars() should be updated to htmlspecialchars ( string string,[int quote_style],“UTF-8” )
    • preg_split() can be replaced with mb_split() but if the flag parameter is set a different solution should be applied.
    • htmlentities() should be updated to htmlentities ( string string,[int quote_style],“UTF-8” )
    • explode() should be replaced with mb_split().
  9. highlight on Search - no working on UTF-8, I'm still looking into this issue.
Last edited by pichirichi on Tue Jun 21, 2005 9:29 am, edited 11 times in total.

hayk
Registered User
Posts: 428
Joined: Tue Feb 04, 2003 10:53 am
Location: exUSSR
Contact:

Post by hayk » Thu Jun 16, 2005 6:37 pm

I would match 3 basic problems in phpBB unicode port development:
1. Replacing all string functions on similar from MBS extension - mb_*. That's the main problem. Besides regular expressions incorrectly work with utf-8 text.
2. Converting of language’s files to utf-8 encoding.
3. Converting and adaptation DB (f/e adding of complementary fields).
I've done it. You can exam it here.

Post Reply

Return to “[2.0.x] MOD Database Cleanup”