phpBB3 and UTF-8 without BOM
Description: Why and how to edit and save phpBB3 php files in the correct file encoding.
In Categories: Miscellanea
- Link to this article: Select All
[url=https://www.phpbb.com/support/docs/en/3.1/kb/article/phpbb3-and-utf-8-without-bom/]Knowledge Base - phpBB3 and UTF-8 without BOM[/url]
All phpBB3 PHP files need to be saved with the file encoding UTF-8 without BOM. The "without BOM" part is important. This article will explain what the BOM is, how it causes problems, and how to prevent these problems.
What is a BOM?
wikipedia wrote:A byte-order mark (BOM) is the Unicode character at code point U+FEFF ("zero-width no-break space") when that character is used to denote the endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32. It is conventionally used as a marker to indicate that text is encoded in UTF-8, UTF-16 or UTF-32.
What this means is that when a file is saved with a BOM, there is a Unicode character inserted into the start of the file which most text editors will not display.
Why is this a problem?
PHP (not phpBB) as a rule is still poor when it comes to handling UTF-8 characters and encoding. For this reason the PHP engine will not know how to handle the BOM character at the start of a php file and will thus treat it as a normal piece of text. In a lot of the phpBB3 files, it is important there is nothing outside of the PHP tags (
?>) so nothing is output to the browser before the headers are sent.
A classic example of this in action is when users have to create a new config.php file and they accidentally leave a space character before the starting PHP tag (
<?php). This causes an error such as the following:
[phpBB Debug] PHP Notice: in file /includes/functions.php on line 3729: Cannot modify header information - headers already sent by (output started at /config.php:1)
How does this relate to the BOM? As the PHP interpreter does not understand how to handle the BOM character in the file it treats it like a normal character and it is parsed in the file as a character before the starting PHP tag, causing an error the same as if you had typed some text in front of the start PHP tag or left a space or blank line.
This problem seems most prevalent when editing the the phpBB3 language files and saving them in the file encoding UTF-8 with the BOM. This can result in similar errors to the one posted above or depending on the files edited it can cause other problems such as the visual confirmation image not working.
How do I save a file correctly?
This differs from editor to editor. As a rule, you should always use a good plain text editor to edit your files. This does not include notepad or wordpad! Microsoft has designed these to automatically insert the BOM when files are saved as UTF-8 and as such they should not be used!
An editor which is often recommended for editing phpBB3 files is Notepad++. This is a free editor which is released under the GPL license.
To edit and save a file correctly using Notepad++ you need to click the word 'Encoding' in the top menu and select 'Encode in UTF-8 without BOM' (as shown in the following image):
Once this option is set you can edit and save your file as normal.
If you are using a different editor, you are likely to find that the method for choosing the file encoding is different. It is very likely you will need to use the "save as" option when saving the file. In most cases, this will give a pop up box which has a drop down menu for setting the file encoding upon saving. If this drop down menu does not have an option "UTF-8 without BOM" then you will need to refer to the editors documentation for instructions on how to save files in the encoding UTF-8 without BOM.
What if I saved in the wrong encoding?
I have seen it said that you can use options in some editors to convert the encoding to UTF-8 without BOM if the file has been saved in the wrong encoding. If your editor does not provide this option, you will have to replace the file with a fresh copy from the phpBB3 download and then reapply any edits to that file you had made. After reapplying the edits be sure to save the file in the correct file encoding, that is UTF-8 without BOM.