[DEV] mysql fulltext search

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment! No new topics are allowed in this forum.
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

IMPORTANT: MOD Development Forum rules

On February 1, 2009 this forum will be set to read only as part of retiring of phpBB2.
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

[DEV] mysql fulltext search

Post by arod-1 »

MOD Download:
search_with_mysql_fulltext.zip
Discussion
This MOD replaces phpbb built-in search mechanism with mysql fulltext search.
there are several reasons why this will be useful for people who can use it, ie. those whose back-end database is mysql.
1 - first and foremost, phrase search ("word1 word2 word3")
2- size. the search_wordlist and search_wordmatch tables take more than 50% of the data, and their indices take much more than 50% of the index space. with mysql fulltext, there is no artificial data, which means cutting more than 50% on backup file size, and the index itself is also more efficient in space utilization.
3 - performance. i don't want to state here my opinion on phpbb built-in search, because i will be banned forever, but it is not good. every new message post result in twice as many queries as there are words in the post (yuck!), and the search itself is also problematic.

note that by using this mod, you lose the ability to search in post text only: all searches are on post title+post text.

i was pretty sure that someone have done it in the past, but i was somewhat disappointed to find that there is only one mod that implemented this, and this mod was abandoned in 2003, so i decided to write one myself.

what i did was a minimalistic job, where the goal was to implement the functionality with as little change as possible to the existing code. the change is in 4 files:
  • search.php where the search is actually implemented
  • includes/functions_search.php where the "phpbb native" handling of the search tables is neutralized (here is where you gain performance for posting and post editing)
  • templates/subSilver/search_body.tpl :some changes to the search UI form, specifically removal of 2 radio-button groups: search all/any (this is handled through the keywords), and search in body+title/body only (this functionality is lost if you use this mod)
  • language/english/lang_main.php: changed the explanation string of how to enter the search string
...

here is the main part of mod, i.e. the change to search.php.
Snipette
1) in search.php

Code: Select all

replace (around line 292)

    for($i = 0; $i < count($split_search); $i++)

with:
    $max_retrieve_results = 20000;
    $phrase = mysql_escape_string($search_keywords);
    $phrase = str_replace('\\\"', '"', $phrase);
    $sql = "select post_id from " . POSTS_TEXT_TABLE . 
        " where match (post_subject, post_text) against('$phrase' in boolean mode) 
        limit $max_retrieve_results";
    if ( !($result = $db->sql_query($sql)))
    {
        message_die(GENERAL_ERROR, 'Could not obtain matched posts list', '', __LINE__, __FILE__, $sql);
    }
    $total_posts = 0;
    while($temp_row = $db->sql_fetchrow($result))
    {
        $result_list[$temp_row['post_id']] = 1;
    }
            
    for(; false ;)
 
in includes/functions_search.php, add return; as the first line of the following functions:
add_search_words()
remove_common()
remove_search_post()
this is where the performance gain in posting, modifying and deleting a post are.

finally, add the search index, and truncate the old tables by performing the following queries:
(the first one builds the fulltext index, and with the last two you regain typically more than 50%, sometimes as much as 70% of your db size).

Code: Select all

alter table phpbb_posts_text add fulltext (post_subject, post_text);
truncate phpbb_search_wordlist;
truncate phpbb_search_wordmatch;
the changes to the form and the english language file are in the attached file.

TO DO
test and make some measurements, both of size and performance.
find and fix the bug.
make the required changes to the search form. ******DONE*****

and finally, package it all in a standard MOD format. ****** DONE*******

if anyone volunteer to do any (or all) of these steps, i will gladly give them this whole thing, including responsibility and credit.
otherwise, i might and might not choose to do it myself.
have a good day.
TODO That someone else needs to do
Create similar MOD for other databases, or tell me how to do it.
the difference should only be in the little snipette i posted here, and the sql command to build the fulltext index.

[EDIT]: added MOD file, as per paul999 request [/EDIT]
File in old mod style


[2008 06 07 EDIT: fixed broken attachment link ]
Last edited by arod-1 on Sun Jun 08, 2008 7:11 am, edited 13 times in total.
standard disclaimer:
backup your db and files before you do anything.
absolutely no guarantee.
if you do what i advise and it blows in your face, all you'll hear from me is: "ah... sorry, i guess"
User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [DEV] mysql fulltext search

Post by drathbun »

I would love to test this out, and have a board with plenty of data (300K+ posts) that should provide a good test. It will be a while before I can set something up, so I'll follow this topic and see what updates you might be able to make before I have the time to actually implement it. Thanks for your efforts.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image
Paul
Infrastructure Team Leader
Infrastructure Team Leader
Posts: 26853
Joined: Sat Dec 04, 2004 3:44 pm
Location: The netherlands.
Name: Paul Sohier
Contact:

Re: [DEV] mysql fulltext search

Post by Paul »

Hello,

We see that you've posted your MOD's contents in your topic starter. Please instead offer your MOD as a textfile download (uploading your MOD's install textfile to a server and giving a direct link, with no registration restrictions), rather than posting the MOD's contents in your post, as you have done.

Thank you for your cooperation,
The phpBB MOD Team
Knock knock
Race condition
Who's there?

My BlogMy Photosmy phpBB Extensionscustom phpBB work & Development
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [DEV] mysql fulltext search

Post by arod-1 »

Thanks.
edited the first post and added a zip-download version of the MOD.
hope it's ok. if it isn't, please let me know and i'll try to rectify.
User avatar
EverettB
Registered User
Posts: 326
Joined: Fri Aug 01, 2003 7:11 pm
Location: North America

Re: [DEV] mysql fulltext search

Post by EverettB »

I am interested in this as well.

Comments:
Your ZIP file has a path in it. This produces an error when I unzip the file. Please remove the path.

You need to account for both radio buttons on the search screen:
Search for any terms or use query as entered
Search for all terms

Search for all should automatically prefix all the entered terms with a + symbol, if needed.
delete from phpbb_search_wordlist;
delete from phpbb_search_wordmatch;

Use truncate.
User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [DEV] mysql fulltext search

Post by drathbun »

EverettB wrote:
delete from phpbb_search_wordlist;
delete from phpbb_search_wordmatch;

Use truncate.

I was going to say the same thing. Is truncate supported by all databases used by phpBB? Reason I ask, is in my "Rebuild Search Index" MOD (not released, don't plan to, don't ask :-P) I use the following code, just to be safe:

Code: Select all

                $do_truncate = TRUE;

                $sql = 'TRUNCATE TABLE ' . SEARCH_WORD_TABLE;
                if ( ! ($result = $db->sql_query($sql) ))
                {
                        $do_truncate = FALSE;
                        $sql = 'DELETE FROM ' . SEARCH_WORD_TABLE;
                        if (! ($result = $db->sql_query($sql) ) )
                        {
                                message_die(GENERAL_ERROR, 'Could not empty search_wordlist table', '', __LINE__, __FILE__, $sql);
                        }
                }

                // Check to see if truncate worked
                if ( $do_truncate )
                {

                        $sql = 'TRUNCATE TABLE ' . SEARCH_MATCH_TABLE;
                }
                else
                {
                        $sql = 'DELETE FROM ' . SEARCH_MATCH_TABLE;
                }
Just wondering... I have not done the research to see how portable the truncate command is. But, then again, this MOD is only for MySQL, right? ;-) So truncate should work.

As a way of explanation, the truncate command is generally faster because it does not store any "undo" or "rollback" information. In my unscientific testing I would not be surprised if "delete from table" without a where clause was mapped to the truncate command anyway, as in my testing there wasn't a lot of difference between the two as far as timings. I created a test copy of my search_wordmatch table from my biggest forum and got the following timings:

Code: Select all

mysql> delete from test;
Query OK, 6598918 rows affected (0.05 sec)

mysql> truncate table test;
Query OK, 0 rows affected (0.09 sec)

mysql> delete from test where word_id > 0;
Query OK, 6598918 rows affected (55.07 sec)
I refilled the table in between each run.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [DEV] mysql fulltext search

Post by arod-1 »

EverettB wrote: Comments:
Your ZIP file has a path in it. This produces an error when I unzip the file. Please remove the path.

my bad. fixed.
EverettB wrote: You need to account for both radio buttons on the search screen:
Search for any terms or use query as entered
Search for all terms

Search for all should automatically prefix all the entered terms with a + symbol, if needed.
solved this by removing the radio buttons from the form, and changed the explanation accordingly.
EverettB wrote: Use truncate.

although delete will also work, i re-read the docs, and you are right. truncate works much faster. done.

General Comments:
1) MySQL uses a bizarre set of "Stopwords", to eliminate what someone thought are very common words in the english language (such as "themselves", "accordingly" "consequently" and "corresponding").
if you control your host, it is possible to replace this list with a more sensible one or to eliminate it entirely, by changing my.cnf and restarting. recommended.
2) by default, MySQL fulltext indexes only words of length 4 or greater. this too can be changed in my.cnf.
the explanation in the MOD (ie, the changes to lang_main.php) reflects the defaults. if you change the defaults, you might want to change the text also.

Enjoy.
Last edited by arod-1 on Tue Apr 17, 2007 10:59 pm, edited 1 time in total.
standard disclaimer:
backup your db and files before you do anything.
absolutely no guarantee.
if you do what i advise and it blows in your face, all you'll hear from me is: "ah... sorry, i guess"
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [DEV] mysql fulltext search

Post by arod-1 »

drathbun wrote: As a way of explanation, the truncate command is generally faster because it does not store any "undo" or "rollback" information. In my unscientific testing I would not be surprised if "delete from table" without a where clause was mapped to the truncate command anyway, as in my testing there wasn't a lot of difference between the two as far as timings. I created a test copy of my search_wordmatch table from my biggest forum and got the following timings:
iiuc, the main difference is not "undo" or "rollback" information, but the way the index(es) are handled.
regarding your measurements: did you also create the indices of those tables before the "delete from", or only the data? if it's only the data, i wouldn't expect much difference in performance, but if these tables also included indices, i am surprised.
User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [DEV] mysql fulltext search

Post by drathbun »

arod-1 wrote: iiuc, the main difference is not "undo" or "rollback" information, but the way the index(es) are handled.

For Oracle the difference is the rollback. :-) If you truncate a table there is no undo. If you "delete * from table" you can undo. This is done by storing the state of the table in a temporary rollback segment until a commit is issued. Most of my DB experience is with Oracle, so I tend to extend the techniques learned there onto MySQL... sometimes with "interesting" results. :lol:
regarding your measurements: did you also create the indices of those tables before the "delete from", or only the data? if it's only the data, i wouldn't expect much difference in performance, but if these tables also included indices, i am surprised.

Retesting now...
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image
User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [DEV] mysql fulltext search

Post by drathbun »

Code: Select all

mysql> create table test as select * from phpbb_search_wordmatch;
Query OK, 6598956 rows affected (8.40 sec)
Records: 6598956  Duplicates: 0  Warnings: 0

mysql> create index test_wordid on test(word_id);
Query OK, 6598956 rows affected (30.72 sec)
Records: 6598956  Duplicates: 0  Warnings: 0

mysql> create index test_postid on test(post_id);
Query OK, 6598956 rows affected (43.75 sec)
Records: 6598956  Duplicates: 0  Warnings: 0
So that creates the table, then creates the same indexes that exist on the standard table.

Now a truncate, followed by a re-insert, followed by a delete * with no where clause. Note that the first insert shown above took only 8.4 seconds since there were no indexes on the table. The second insert took a lot longer because the indexes had to be updated at the same time the rows were being inserted.

Code: Select all

mysql> truncate table test;
Query OK, 0 rows affected (0.15 sec)

mysql> insert into test select * from phpbb_search_wordmatch;
Query OK, 6598956 rows affected (52.73 sec)
Records: 6598956  Duplicates: 0  Warnings: 0

mysql> delete from test;
Query OK, 6598956 rows affected (0.19 sec)
These tests are being run on a dual cpu xeon box with 4GB of ram on MySQL 4.1. Based on the timings, I believe that "delete from test" is being mapped to "truncate". The delete with a where clause was substantially slower.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [DEV] mysql fulltext search

Post by arod-1 »

just wondered: did anyone try to use this mod? if so, can you give me some input?
thanks.
Rizzn.
Registered User
Posts: 264
Joined: Wed Dec 28, 2005 9:53 am
Contact:

Re: [DEV] mysql fulltext search

Post by Rizzn. »

I used it in a test environment, but I've been too busy to look into implementing it on an actual forum. On the test site though, it seemed to work pretty well. I'd still need to do more extensive testing before I decided whether or not to add it to any of my working forums. It is a very good concept though.
[Alpha] Store MOD (phpBB3)
- Support site forthcoming -
--------------------------------------
[RC1] wGEric Store MOD (phpBB2)
Additional Usable Items and Store MOD Support Forums (separate site)
User avatar
IPB_Refugee
Registered User
Posts: 1290
Joined: Fri Jul 07, 2006 2:25 pm
Location: Austria
Name: Wolfgang Weber

Re: [DEV] mysql fulltext search

Post by IPB_Refugee »

Hi,

I´ve tried a different MySQL Fulltext Search MOD before which also needed

Code: Select all

alter table phpbb_posts_text add fulltext (post_subject, post_text);
But when I did this command via phpMyAdmin the table phpbb_posts_text grew even larger than the summary of both search tables. :( (At the time I tried the other MOD I had about 4000 posts in my database but a good search_stopwords.txt.) Can someone confirm that with arod-1´s MOD the database will really decrease?

Regards
Wolfgang

BTW: If you use a SEO-MOD which creates dynamical meta tags for your topics you need the search tables - you should consider this before truncating them.
Last edited by IPB_Refugee on Sun Apr 29, 2007 8:01 pm, edited 1 time in total.
User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [DEV] mysql fulltext search

Post by drathbun »

The overall database size should decrease because you can drop the phpbb_search_wordlist and phpbb_search_wordmatch tables, which are often the largest tables in your phpbb database. Yes, there is an increase in the index size for the other tables. But dropping those other two tables can (or should) make up the difference and then some.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image
User avatar
IPB_Refugee
Registered User
Posts: 1290
Joined: Fri Jul 07, 2006 2:25 pm
Location: Austria
Name: Wolfgang Weber

Re: [DEV] mysql fulltext search

Post by IPB_Refugee »

Hi,
regarding to the similar Fulltext Search MOD I mentioned above: Adding fulltext to the table phpbb_posts_text increased the size of this table more than 6mb. My two search tables had in summary only 4.7mb. So I had no saving of web space but the exact opposite.

Regards
Wolfgang
Post Reply

Return to “[2.0.x] MODs in Development”