Conversion of localized language files from ISO to UTF-8

Get help with installation and running phpBB 3.0.x here. Please do not post bug reports, feature requests, or MOD-related questions here.
Scam Warning
Forum rules
END OF SUPPORT: 1 January 2017 (announcement)
John Hjorth
Registered User
Posts: 603
Joined: Sun Aug 07, 2005 8:24 am
Location: Odense, Denmark, EU
Contact:

Conversion of localized language files from ISO to UTF-8

Post by John Hjorth » Wed Nov 15, 2006 3:26 pm

The Danish translation team is currently in the proces of updating the language package for Beta 2 to Beta 3, which includes :

1 Conversion of files from ISO 8859-1 to UTF-8
2 Update of all language files with the changes that have taken place in English language files in Beta 3 compared to Beta 2.

We have tried to do the conversion, but we can't get it to work - obviously we are doing something wrong.

A basic explanation of how to perform this conversion would be much appreciated.
John Hjorth • Official Danish phpBB language package maintainer and translator
The Danish Olympus & Ascreaus translation projectDanish phpBB support site

User avatar
lurttinen
Translator
Posts: 4670
Joined: Tue Sep 21, 2004 12:05 pm
Location: Tampere, Finland
Name: Martti Lokka
Contact:

Post by lurttinen » Wed Nov 15, 2006 4:28 pm

Tip, download notepad++

in the notepad. Open one of the original english language packs and see its encoding and how it is displayed.

Sorry, my notepad it in finnish, but it is fifth from the left saying something like "file format" or "format"

Pull it down and it should say the file is in unix form (or convert to unix, if not...) and the encoding is in ANSi
The file is then displayed as UTF-8.

The tricky part was not convert it to unix format, but when i changed it to display as UTF-8 the äöå, etc were all messed.

What i did was a copy the whole document to clipboard before i made it to "show as UTF" and then paste the whole document from clipboard overwriting the content.

And it worked. Olympus was happy again.
It was a bit of work to do it to all of the files and remember there is changes in the email files too.

Only thing that caused me a headache in translation was the darn "install.php" wich had a quite few new entries added. :lol:
Signature is here

John Hjorth
Registered User
Posts: 603
Joined: Sun Aug 07, 2005 8:24 am
Location: Odense, Denmark, EU
Contact:

Post by John Hjorth » Wed Nov 15, 2006 4:38 pm

Thank you very much :D

Yes I am aware of that the encoding lines (line 2) in all email files has to be removed plus the email files have to be converted also.

Also we need to delete the language variable ENCODING and add the language variable USER_LANG i common.php (language root).
Last edited by John Hjorth on Thu Nov 16, 2006 7:36 pm, edited 1 time in total.
John Hjorth • Official Danish phpBB language package maintainer and translator
The Danish Olympus & Ascreaus translation projectDanish phpBB support site

User avatar
Jesper Møller
Registered User
Posts: 239
Joined: Wed Jul 05, 2006 1:00 pm
Location: Copenhagen Denmark
Name: Jesper G.O. Møller
Contact:

Post by Jesper Møller » Wed Nov 15, 2006 5:27 pm

Hello lurttinen

Its me who have converted Johns translation to UTF-8

Where things gos wrong i dont know ..


First of all .. Im a Mac user ..and are using GoLive (And no thats NOT the problem :-P )

What i dit was opning johns file in GoLive using the encodingtype that woud show letter like æ ø å corecet (In this case windows-1252) i then converted the files to UTF-8 and saved them (using CR/LF lineendings)

This has always workd ... but when John use the something goes wrong .. all styles are gone frome the site.. and that makes no sens to me

Tip, download notepad++

That no good for Mac ;-) .. im using TextEdit (txt/rtf)
Pull it down and it should say the file is in unix form (or convert to unix, if not...) and the encoding is in ANSi

Those aint posible in TextEdit .. in preference i have the options to set "plain text file encoding" for:
"Opening files" and "Save files"
And "HTML saving options" with "Doc type" "Styling" and "Encoding"
If i try to open the eng. or my danish files with a mitchmaching encoding i can open the (some carecter being messd op as the shud) .. but the files frome john i can ONLY open in windows-1252 .. if i try opning them in other encodings i get a "The file may been savde using a different encoding, or it may not bee a text file" (Where as my own files or the original english files open (with errors) if a wrong doctype is used.

For some reason i cant resave Johns document as utf-8 even thoug my settings is to do it .. but i can change the encoding of my or the english file if useing the same metode.
so what i do is using "save as" and overwriting johns files and then i can open his file with whatever encoding i want.. (With errors ofcause)

so to mee its like John somewhere is getting/setting something wrong when using/oploading the files

:?:

Jesper
(Pleas dont ask peapol tu save using Unix lineending.. on a mac the result is ONE line of code .. witch can bee a hell to redo ... use LF/CR witch all can use :wink: )
"Education is learning what you didn't even know you didn't know!"

"True knowledge exists in knowing that you know nothing."

User avatar
naderman
Consultant
Consultant
Posts: 3735
Joined: Fri Aug 01, 2003 10:06 pm
Location: Berlin, Germany
Name: Nils Adermann
Contact:

Post by naderman » Wed Nov 15, 2006 5:50 pm

You can try Subethaedit as an editor on OS X, does encoding conversion really well for me.
I appreciate gifts from my Amazon wishlist.
naderman.de twitter: @naderman

User avatar
Jesper Møller
Registered User
Posts: 239
Joined: Wed Jul 05, 2006 1:00 pm
Location: Copenhagen Denmark
Name: Jesper G.O. Møller
Contact:

Post by Jesper Møller » Wed Nov 15, 2006 5:57 pm

Thanks naderman

ill look in to that .. (Always good to have more than 3 optins to try things in ;-) )
However i dont think the problem is with the editor i use .. I have never had problems like with it and what i have converted to other use works fine ...
"Education is learning what you didn't even know you didn't know!"

"True knowledge exists in knowing that you know nothing."

User avatar
SHS`
Former Team Member
Posts: 6615
Joined: Wed Jul 04, 2001 9:13 am
Location: Yellow Beach, Nine Dragons, Hong Kong
Name: Jonathan Stanley
Contact:

Post by SHS` » Wed Nov 15, 2006 6:19 pm

Jesper Møller wrote: (Pleas dont ask peapol tu save using Unix lineending.. on a mac the result is ONE line of code .. witch can bee a hell to redo ... use LF/CR witch all can use :wink: )


Any proper text editor will allow any of the three common linefeed types, *NIX, Windows or Mac. However, all files that packaged for phpBB must use *NIX linefeeds... language packs included, since it's been in the Coding Guidelines since year dot:

http://area51.phpbb.com/docs/coding-gui ... orsettings

Likewise any decent text editor will also support UTF-8:

http://en.wikipedia.org/wiki/Comparison ... ng_support
Jonathan “SHS`” Stanley • 史德信
Image

User avatar
lurttinen
Translator
Posts: 4670
Joined: Tue Sep 21, 2004 12:05 pm
Location: Tampere, Finland
Name: Martti Lokka
Contact:

Post by lurttinen » Wed Nov 15, 2006 6:31 pm

Sorry, i dont know much about macs, but check your PM's shortly. ;)
Signature is here

User avatar
Jesper Møller
Registered User
Posts: 239
Joined: Wed Jul 05, 2006 1:00 pm
Location: Copenhagen Denmark
Name: Jesper G.O. Møller
Contact:

Post by Jesper Møller » Wed Nov 15, 2006 6:35 pm

Likewise any decent text editor will also support UTF-8:


Yes :) i dint say anything else ... only that the ecoding forms dont mention convert to unix or ANSi .... but use therms like UTF-8 Windows Latin and so ;-)
However, all files that packaged for phpBB must use *NIX linefeeds... language packs included, since it's been in the Coding Guidelines since year dot:
Well windows lineendigs is also unix lineending since widows use CR/LF as all unix machin understand since unix overlook CR.. the problem is if a document is made with pure LF lineending then Mac don understand it ..since mac use CR ...
All styles (including subsilver) that i have downloadet her has been with CR/LF lineendings..
Also the style i have submited and is released is with CR/LF :wink:
"Education is learning what you didn't even know you didn't know!"

"True knowledge exists in knowing that you know nothing."

User avatar
Jesper Møller
Registered User
Posts: 239
Joined: Wed Jul 05, 2006 1:00 pm
Location: Copenhagen Denmark
Name: Jesper G.O. Møller
Contact:

Post by Jesper Møller » Wed Nov 15, 2006 6:36 pm

lurttinen

:D I will :D
"Education is learning what you didn't even know you didn't know!"

"True knowledge exists in knowing that you know nothing."

User avatar
Joe User
Registered User
Posts: 71
Joined: Mon Sep 13, 2004 9:56 am
Location: Germany
Name: Markus Kohlmeyer
Contact:

Post by Joe User » Wed Nov 15, 2006 10:13 pm

Quick&Dirty *NIX/bash:

Code: Select all

for file in `find phpBB2/ -type f -name \*.php -o -name \*.tpl -o -name \*.inc -o -name \*.cfg -o -name \*.txt -o -name \*.sql`; do \
sed 's/[iI][sS][oO]-8859-1/UTF-8/g' -i ${file} &&
iconv -f LATIN1 -t UTF-8 ${file} > ${file}.utf8 &&
mv ${file}.utf8 ${file};
done
PayPal.Me/JoeUserFreeBSD Remote Installation
Wings for LifeWings for Life World Run

„If there’s more than one possible outcome of a job or task, and one
of those outcomes will result in disaster or an undesirable consequence,
then somebody will do it that way.“ -- Edward Aloysius Murphy Jr.

John Hjorth
Registered User
Posts: 603
Joined: Sun Aug 07, 2005 8:24 am
Location: Odense, Denmark, EU
Contact:

Post by John Hjorth » Thu Nov 16, 2006 4:00 pm

First of all, thank you everybody who has written and contributed in this topic - You have been outmost helpfull.

After quite some trial and error efforts, we did get it working - especially thanks to Lurttinen for the referral to Notepad++ - that did the trick.

Personally I have been using the following editors all the time I have been messing around with phpBB :

ConTEXT
WinMerge
Araxis (I certainly prefer Araxis for WinMerge)

It seems quite evident that ConTEXT has come to short here, without the ability to handle and interpret UTF-8 files correctly.

Notepad++ is doing the job, but I'm not impressed with the software at all.

Naderman has with the best intentions directed Jesper Møller's focus to a piece of Mac software above ... unfortunately, I'm on a Win machine.

- Any other "decent" (SHS`s terminology :wink: ) Win based file editor worth mentioning with regards to this task :?: - GPL is naturally preferred, but not a "subject to" ... :wink:
John Hjorth • Official Danish phpBB language package maintainer and translator
The Danish Olympus & Ascreaus translation projectDanish phpBB support site

User avatar
Jesper Møller
Registered User
Posts: 239
Joined: Wed Jul 05, 2006 1:00 pm
Location: Copenhagen Denmark
Name: Jesper G.O. Møller
Contact:

Post by Jesper Møller » Thu Nov 16, 2006 4:12 pm

CR/LF lineending has been useable in phpBB2 .. but if phpBB3 only acepts LF ther is going to be major problem
1. alomst evry Windows use CR/LF as standart .. so if a user edit a file it wont work
2. Old Mac uses Only CR and LF wont be reconisde..
3.many servers (apacher) is set to send CR/LF as default regadles of lineendings
Most textual Internet protocols (including HTTP, SMTP, FTP, IRC and many others) mandate the use of ASCII CR+LF (0x0D 0x0A) on the protocol level, but recommend that tolerant applications recognize lone LF as well
"Education is learning what you didn't even know you didn't know!"

"True knowledge exists in knowing that you know nothing."

John Hjorth
Registered User
Posts: 603
Joined: Sun Aug 07, 2005 8:24 am
Location: Odense, Denmark, EU
Contact:

Post by John Hjorth » Sun Nov 19, 2006 10:56 am

I have read the coding guide lines and the contents of of the .php-files in Beta 3 : "All language files should use UTF-8 as their encoding and the files must not contain a BOM. "

.txt-files in email folder in the language folder in Beta 3 are not encoded in UTF-8 without a BOM.

Is that the correct encoding for localized language files in the email folder while doing translation of Beta 3 ?
John Hjorth • Official Danish phpBB language package maintainer and translator
The Danish Olympus & Ascreaus translation projectDanish phpBB support site

User avatar
lurttinen
Translator
Posts: 4670
Joined: Tue Sep 21, 2004 12:05 pm
Location: Tampere, Finland
Name: Martti Lokka
Contact:

Post by lurttinen » Sun Nov 19, 2006 11:48 am

I have just tested this and yes. The emails needs the same treatment as the rest of the language files.

Firefox shows the mail contetnt as utf and that is not how the original email files are encoded.

Report it to the bugtracker?
Signature is here

Locked

Return to “[3.0.x] Support Forum”