Generic message import

Converting from other board software? Good decision! Need help? Have a question about a convertor? Wish to offer a convertor package? Post here.
Scam Warning
Giles314
Registered User
Posts: 51
Joined: Thu Nov 11, 2010 12:12 am

Generic message import

Post by Giles314 »

Presentation
This MOD can be used to import messages (or posts) in a phpBB3 forum from an XML file. Therefore this will allow you to import messages from virtually any forum provided you have been able to save its content in XML.

Of course the input XML schema is predefined and it will be necessary to adapt your saved XML data to the expected input format. For that operation XSLT will be your friend.

Characteristics
The data managed by import_messages-V1.0 are:
  • The topic title*
  • The topic parent forum*
  • The message texts*
  • The message parent topic*
  • The message titles (if absent uses topic title)
  • The message date - many format accepted (if absent uses current import date)
  • The message poster user name (if absent or unknown uses anonymous user)
  • The message poster IP address
  • Flags to indicate whether BBCODE, Smiley, URL must be converted and whether signature must be added (if absent configurable default value used)
import_messages-v1.0, creates all the topics and add messages in then. It also indexes the messages for the search operations. But it does not create the forums. You have to create them manually before running the importation. Note that imported forums are ALWAYS created. This means that if you import twice your data, all the forums will be created twice.

import_messages-v1.0 has a preview mode to verify the data to be imported against the current forum content. It will allow you to know:
  • which users are unknown (and will be replaced by anonymous user if not added in the forum before the actual importation)
  • which forums are missing (and will cause the loss of child topics if not created before actual importation)
  • which dates are incorrectly recognized (and will be replaced by current time if their format is not fixed before actual importation)
  • a bunch of errors that would prevent the import operation
It is recommended to use preview before running the import procedure.
It is yet more recommended that you save all your database before doing any operation.
If you miss a lot of users, you may want to import them before importing the messages. This can be performed using the D¡cky's import users tool that I have provided an updated version here.

The message import operation is a long operation. Depending on the quantity of messages to import and on the maximum execution time of your PHP scripts, you may not be able to import all the messages at once. Even the preview may not be possible for all messages at once. This is the reason why import_messages-V1.0 will allow you to specify a range of topic to import. This will allow you to run the import of your data chunk by chunk. If you select a too large chunk at once the operation will stop before the end and show you which is the actual imported chunk.
Warning: If your topics are too big (more than 2s to import one) there is a risk that the operation fails inside a topic importation. This is an unrecoverable error that will let a incomplete topic in your forum. Therefore you should split too large topics in smaller parts or import them alone.

Installation instructions
The installation of the import_messages-V0.1 module requires the following steps:
  • Extract the files from the import_message-V1.0.zip file keeping the directory structure
  • Upload the files in the root directory to your board root directory
  • Navigate to the Navigate to: Administration Control Panel > System > Module Management > Administration Control Panel > Posting > Messages
  • Select “Import Messages” from the Add Module drop-down menu and click “Add module”.
  • Then click “Enable” on the Module.
If you wish you can add the French language pack by reproducing the first 2 steps with import_message-fr-V1.0.zip

Usage Instructions
After installation, the module can be used as follow:
  • Format the data to import according to the schema provided in includes\acp\import_messages.xsd or described in figure below
  • Upload the XML file to import in the store directory
  • Navigate to Administration Control Panel > Posting > Messages > Import Messages
  • Fill at least the name of the file
  • Click on preview button
  • Fix any errors and resume at previous step
  • If the preview is successful click on Submit button (5 time smaller range of topics is recommended at this step)
Edited to replace attachment import_messages-V1.0.zip by import_messages-V1.0.1.zip
Attachments
Graphical presentation of input XML file schema
Graphical presentation of input XML file schema
Schema.png (20.55 KiB) Viewed 35098 times
import_messages-fr-V1.0.zip
Import Messages V1.0 French language pack
(10.7 KiB) Downloaded 686 times
import_messages-V1.0.1.zip
Import Messages V1.0.1 MOD
(46.65 KiB) Downloaded 911 times
Last edited by Giles314 on Tue Jan 04, 2011 7:15 pm, edited 1 time in total.
Giles314
Registered User
Posts: 51
Joined: Thu Nov 11, 2010 12:12 am

V1.0.1 Update

Post by Giles314 »

The update V1.0.1 fixes the 2 following anomalies:
  • The number of imported messages accounted for each users was much too large unless topics were imported one by one.
  • The order of the statistic messages was wrong so the global statistic message was unclear.
This fix has no impact on the language file so the French language pack V1.0 is still compatible with new V1.0.1.
csunds
Registered User
Posts: 3
Joined: Mon Jan 24, 2011 8:17 pm

Re: Generic message import

Post by csunds »

Hi Gilles

Thank you for a very nice tool!!

First im a XML newbie!

I have data from an Access db, and are using the fully functional trial version (30days) of MapForce, to make the transformation.

Have made an query in access, that takes alle the fields i should use, to make the xsd "happy" 8-)

Im able to load your xsd, and my tables. But i have a problem with my XML output file.

If gives me duplicate :

<topic forum-name="Forum1" title="hi everybody this is a test">
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person1" ip="123.213.123.32">Here is my very first msg on this forum.... bla.bla</message>
</topic>
<topic forum-name="Forum1" title="hi everybody this is a test">
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person3" ip="223.253.123.32">Im on it..this forum.... bla.bla</message>
</topic>
<topic forum-name="Forum1" title="hi everybody this is a test">
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person5" ip="123.213.123.32">hey all, im also new here.. bla.bla</message>
</topic>

The right output XML should be :
<topic forum-name="Forum1" title="hi everybody this is a test">
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person5" ip="123.213.123.32">hey all, im also new here.. bla.bla</message>
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person3" ip="223.253.123.32">Im on it..this forum.... bla.bla</message>
<message title="hi everybody this is a tes" posted="2007-05-03 23:38:32" by="person1" ip="123.213.123.32">Here is my very first msg on this forum.... bla.bla</message>
</topic>

... right ??

as you can see it makes a new <topic forum-name .... for every post.

Can you ... :
1. give me a hint how to solve this. (maybe a group by statement could do it in my query ? but i dont know how to make i fit)
2. maybe put up one XML example file.
csunds
Registered User
Posts: 3
Joined: Mon Jan 24, 2011 8:17 pm

Re: Generic message import

Post by csunds »

Ohhh never mind... i got it ! It was the Group BY function. 8-)

Now i have the correct mapping, and the XML output looks smooth.

But the operation breaks

Maybe its bacause the users does not exsist anymore or maybe because some usernames from the old DB has used characters like - * ´/ ' in it. ?

Hope you are able to help me out here ?

Code: Select all

SQL ERROR [ mysql4 ]

Incorrect integer value: '-' for column 'poster_id' at row 1 [1366]

SQL

INSERT INTO phpbb_posts (topic_id, forum_id, poster_id, poster_ip, post_time, enable_bbcode, enable_smilies, enable_magic_url, enable_sig, post_username, post_subject, post_text, post_checksum, bbcode_bitfield, bbcode_uid, post_postcount) VALUES (278, '14', '-', '62.242.2.27', 1154601592, 1, 1, 1, 0, '-pRoFloW-', 'M&#111;nitors', 'Behringer Truth 2031', 'ffbeb6df60d4b805b3e78e4e0afb6dd6', '', '1bd5kcnw', 0)

BACKTRACE

FILE: includes/db/mysql.php
LINE: 175
CALL: dbal->sql_error()

FILE: includes/acp/acp_import_messages.php
LINE: 770
CALL: dbal_mysql->sql_query()

FILE: includes/acp/acp_import_messages.php
LINE: 406
CALL: acp_import_messages->add_cur_topic()

FILE: includes/functions_module.php
LINE: 507
CALL: acp_import_messages->main()

FILE: adm/index.php
LINE: 74
CALL: p_master->load_active()
User avatar
Mess
Registered User
Posts: 985
Joined: Wed Jul 01, 2009 6:37 am
Name: Kim

Re: Generic message import

Post by Mess »

woa, cool tool! :)
rborsheim
Registered User
Posts: 5
Joined: Fri Jan 28, 2011 9:56 am

Re: Generic message import

Post by rborsheim »

Here's a working sample:

Code: Select all

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<forum-message-list date-format="%Y-%m-%d %H:%M:%S">
	<topic forum-name="General Off Topic Discussion" title="hi everybody this is a test">
		<message posted="2003-10-01 1:14:45" by="admin" ip="123.213.123.75">Here is my very first message on this new forum</message>
		<message posted="2004-11-12 2:13:34" by="rajy" ip="223.253.123.99">My first message too!</message>
		<message posted="2005-12-23 3:12:23" by="admin" ip="123.213.123.75">Misery loves company! :-)</message>
	</topic>
</forum-message-list>
rborsheim
Registered User
Posts: 5
Joined: Fri Jan 28, 2011 9:56 am

Re: Generic message import

Post by rborsheim »

I just successfully converted around 25,000 messages with this. Thank you very much!

I have a few tips, and a question. First the tips.

Some of this is obvious, but I'll say it anyhow:

o) Get the free demo on the program "Altova XMLSpy". It will fix problems like invalid characters in your import file. You can use it to check for "well-formedness", and against the schema (import_messages.xsd)
o) Look through your messages for links to images, smilies etc. and convert them to the phpBB equivalent
o) Replace all < and > characters in your messages (with something like [ and ])
o) Replace all " characters with '
o) replace all & caracters with the word "and"

My only real issue of note is that I replaced all my external image links to "[img]imageurl[/img]", but they don't work. (http://amabilis.com/forum/viewtopic.php?f=28&t=1638) Does anyone have any idea how to fix this?

Edit: I just noticed when I manually updated the message that the bold and italics stopped working as well.
csunds
Registered User
Posts: 3
Joined: Mon Jan 24, 2011 8:17 pm

Re: Generic message import

Post by csunds »

But how do i get the script to accept characters like - _ = in the authors. ? Does anyone have a hint on this ?

I get this error :

SQL ERROR [ mysql4 ]

Incorrect integer value: '-' for column 'poster_id' at row 1 [1366]
rborsheim
Registered User
Posts: 5
Joined: Fri Jan 28, 2011 9:56 am

Re: Generic message import

Post by rborsheim »

The problem I had with images was my own fault. I didn't check the right options on import. I fixed it with this SQL query:

Code: Select all

UPDATE Phpbb_posts
SET enable_bbcode=1, enable_smilies=1
To actually get the images to show up, I had to run the Reparse BBCode function of the Support Toolkit.
sundsdal
Registered User
Posts: 5
Joined: Thu Dec 11, 2008 12:39 am

Re: Generic message import

Post by sundsdal »

rborsheim : looks like you are more into it (php) than me :-)

Did you have any any problems, with the import when it comes to the users ? Seems like the script can't handle deleted users.
rborsheim
Registered User
Posts: 5
Joined: Fri Jan 28, 2011 9:56 am

Re: Generic message import

Post by rborsheim »

No, I didn't have any real problems. Of the 25k I imported only 1 topic failed. Because the import told me how many it successfully imported, I could continue importing the remainder of the messages.

But, I have to admit that I did do a lot of clean up of the database using XMLSpy and some other processing.

If you are having problems with member names, maybe you can look for oddities (strange characters for instance) in names and rename people on your forum, so your import can succeed.
sundsdal
Registered User
Posts: 5
Joined: Thu Dec 11, 2008 12:39 am

Re: Generic message import

Post by sundsdal »

Thanks ... i will try that.

I have over 349k posts to import (so xmlspy will work overtime :)).... and have just imported 28543 users.
sundsdal
Registered User
Posts: 5
Joined: Thu Dec 11, 2008 12:39 am

Re: Generic message import

Post by sundsdal »

Btw.... what function in xmlspy did you use to clean the xml file ?
sundsdal
Registered User
Posts: 5
Joined: Thu Dec 11, 2008 12:39 am

Re: Generic message import

Post by sundsdal »

I found out that is MUST the the function that checks if the username exist. Now i have cleaned up a test file for weird signs lige %^- = ´ and so on.

See this error now :

Code: Select all

Incorrect integer value: 'p' for column 'poster_id' at row 1 [1366]

SQL

INSERT INTO phpbb_posts (topic_id, forum_id, poster_id, poster_ip, post_time, enable_bbcode, enable_smilies, enable_magic_url, enable_sig, post_username, post_subject, post_text, post_checksum, bbcode_bitfield, bbcode_uid, post_postcount) VALUES (293, '14', 'p', '62.242.2.27', 1154601592, 1, 1, 1, 0, 'pRoFloW', 'M&#111;nitors', 'Behringer Truth 2031', 'ffbeb6df60d4b805b3e78e4e0afb6dd6', '', '2sgppqdg', 0)
This user does not exsist anymore, and now its trying to put the username into the poster_id, instead of converting the user to "anonymous".

rborsheim: you did not run into to this issue ... with the anonymous convert thingy ?




if ($poster_data['user_id'] != ANONYMOUS)
rborsheim
Registered User
Posts: 5
Joined: Fri Jan 28, 2011 9:56 am

Re: Generic message import

Post by rborsheim »

In XMLSpy all I used was the XML menu's "Check Well formedness" and "Validate XML". To use "Validate XML", you have to assign the schema, which is in the download.

The rest of my corrections I did via "search and replace". I exported as CSV and imported into Microsoft Access and searched and replaced within the messages for all sorts of things. The critical ones I mentioned.

It looks like you have a much tougher job. I was fortunate in that my member names were clean.
Locked

Return to “[3.0.x] Convertors”