Bug tracker

SQL: "Illegal mix of collations" - When typing Unicode characters (not a bug)

Posting characters in the Unicode range, such as in the example below from http://en.wikipedia.org/wiki/Bengali_language (first sentence), produces the following MySQL error:

===============================

SQL ERROR [ mysql4 ]

Illegal mix of collations for operation ' IN ' [1271]

SQL

SELECT word_id, word_text FROM phpbb_search_wordlist WHERE word_text IN ('bengali', 'bangla', 'বাংলা', 'ipa', 'baŋla', 'grammar', 'articles', 'online')

BACKTRACE

FILE: includes/db/mysql.php
LINE: 132
CALL: dbal_mysql->sql_error()

FILE: includes/search/fulltext_native.php
LINE: 1072
CALL: dbal_mysql->sql_query()

FILE: includes/functions_posting.php
LINE: 1892
CALL: fulltext_native->index()

FILE: posting.php
LINE: 917
CALL: submit_post()

===============================

Regardless, the post / topic gets posted, but all Unicode-range characters appear as question marks. When doing post Preview, the characters appear as they ought to. If one edits the post, the error doesn't appear again when the edited post is submitted with the Unicode characters again in place, but they are regardless again changed to ?'s. They are also stored in the database as nothing but question marks, so the problem is definitely on the way in, not on the way out.

The problem occurs on Linux with MySQL 4.1.20. However, I cannot duplicate this on Windows with MySQL 4.1.14. The server configurations are identical as far as MySQL charset and MySQL connection collation go (set to UTF-8 Unicode (utf8) / utf8_unicode_ci). The only difference is that the working database is setup as InnoDB, the bug-creating one is setup as MyISAM. Tables in both are InnoDB however. I can't change this on a per-database level — I would need to change this on a per-server level in my.ini.

As seen in the below screenshots from PHPMyAdmin, upper works, lower doesn't:

Image

PHP versions are 5.0.5 and 4.3.9 respectively for the one that works and the one that doesn't, should it make a difference.

Comments / History

Posted by Acyd Burn (Server Manager) on Nov 18th 2006, 10:42

There is something really wrong here nils, since this is the third time this problem occurs. Maybe it is easily reproducable by using the text used by the reporter.

Posted by CodeWallah on Nov 18th 2006, 19:00

Testing with my local Windows server, I can confirm that this seems to be an InnoDB vs. MyISAM issue.

When I run the MySQL Server Instance Config Wizard and select Non-Transactional Database Only (Only MyISAM engine activated.) and reload the database, I manage to break the working setup and reproduce the error.

Posted by CodeWallah on Nov 18th 2006, 19:24

However the problem seems to be elsewhere after all. Throwing in a fresh PHPBB3 install, I now manage to have both a broken and a working version side-by-side with InnoDB as the engine.

With some further study, I notice that problems occur on boards where database has been exported and imported via PHPMyAdmin. What happens is that the collations of both tables and fields get messed up in the process.

Table default collations apparently get switched to the MySQL engine default collation (ie. latin1_swedish_c1), and for example phpbb_search_wordlist / word_text has changed from utf8_bin to latin_1.

Changing both the table and the field collations back to the original utf8_bin, I have just repaired a board that exhibited this bug.

The install/schemas/mysql_41_schema.sql file for installation has the collations configured exactly right. Accordingly, when I do a fresh install the schema is as intended. However when I export the database via PHPMyAdmin, the collation data isn't exported along with the data.

This happens only when one selects "SQL export compatibility: MySQL40" (which I seem to be in the habit of doing) -- not when one selects "None". The "None" option includes the proper collation data.

There's our gremlin... This, then, isn't a bug with phpbb3 B3 as such, but it is an issue that may need to be documented somewhere as people will undoubtedly be eventually importing and exporting their databases around.

Posted by CodeWallah on Nov 18th 2006, 20:02

Bah! There's a bug after all!

Namely, the Maintenance > Backup feature does the same goof: It doesn't include collation data for tables or fields in the SQL dumps. Do a fresh install, make a backup and restore it: You'll be able to duplicate the error I reported.

Posted by CodeWallah on Nov 18th 2006, 20:10

For example, using the search_wordlist table we've been looking at:

SQL command from the dump by phpbb3 maintenance module:

Code: Select all
DROP TABLE IF EXISTS phpbb_search_wordlist;
CREATE TABLE phpbb_search_wordlist(
   word_id mediumint(8) unsigned NOT NULL auto_increment,
   word_text varchar(255) DEFAULT '' NOT NULL,
   word_common tinyint(1) unsigned DEFAULT '0' NOT NULL,
   PRIMARY KEY (word_id),
   UNIQUE wrd_txt (word_text)
);

From a dump with PHPMyAdmin that features the full data:

Code: Select all
DROP TABLE IF EXISTS `phpbb_search_wordlist`;
CREATE TABLE IF NOT EXISTS `phpbb_search_wordlist` (
  `word_id` mediumint(8) unsigned NOT NULL auto_increment,
  `word_text` varchar(255) collate utf8_bin NOT NULL default '',
  `word_common` tinyint(1) unsigned NOT NULL default '0',
  PRIMARY KEY  (`word_id`),
  UNIQUE KEY `wrd_txt` (`word_text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1123 ;

Note the last line, which is what makes all the difference here. Note also the lacking collation and charset definitions for the fields.

Aside the charset and collations issue, note that also the auto_increment value is also omitted, which quite possibly is the cause of this bug and other "Duplicate entry" errored bugs (that will likely surface if they aren't there yet).

Posted by DavidMJ (Development Team Member) on Nov 18th 2006, 20:16

we never include auto_increment and we never should, it does not cause the other bug. that bug is caused by a duplicate entry trying to be included into a column.

I will fix that backup issue.

Posted by CodeWallah on Nov 18th 2006, 20:35

All right, just a guess on the other one.

Thanks for getting this one fixed.

Posted by naderman (Development Team Leader) on Nov 19th 2006, 12:42

Well it definately should export charset and collation, apart from that it appears there is no bug here?

Ticket details

Related SVN changesets