[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Deriel
Registered User
Posts: 164
Joined: Tue Aug 27, 2002 5:16 pm
Location: Curitiba/PR - Brasil
Contact:

Post by Deriel » Wed Oct 15, 2003 9:26 pm

lanzer wrote: I'm very glad that there are people who benefitted from the tweaks. :D I was wondering for a moment if anyone would read this, so I didn't add any more codes/tweaks.

Anyways, here's the class that I snipped off the phpbb 2.2.0 CVS file. I'm glad that it's nice and portable now so we can try out different people approach in solving search table problems. Anyone can slip this code at the end of the functions_search.php file (before the "?>" line) and use it on version 2.0.x. But you must alter the database to have it working properly. The necessary changes are mentioned below.


I applied the "search" tweaks and almort everything work. Editing a post:

Code: Select all

Warning: array_keys(): The first argument should be an array in /www/valinor/forum/www/includes/functions_search.php on line 616

Warning: array_diff(): Argument #2 is not an array in /www/valinor/forum/www/includes/functions_search.php on line 616

Warning: array_keys(): The first argument should be an array in /www/valinor/forum/www/includes/functions_search.php on line 617

Warning: array_diff(): Argument #2 is not an array in /www/valinor/forum/www/includes/functions_search.php on line 617

Warning: array_keys(): The first argument should be an array in /www/valinor/forum/www/includes/functions_search.php on line 618

Warning: array_diff(): Argument #1 is not an array in /www/valinor/forum/www/includes/functions_search.php on line 618

Warning: array_keys(): The first argument should be an array in /www/valinor/forum/www/includes/functions_search.php on line 619

Warning: array_diff(): Argument #1 is not an array in /www/valinor/forum/www/includes/functions_search.php on line 619
lines 616 to 618 are:

Code: Select all

         $words['add']['post'] = array_diff($split_text, array_keys($cur_words['post'])); 
         $words['add']['title'] = array_diff($split_title, array_keys($cur_words['title'])); 
         $words['del']['post'] = array_diff(array_keys($cur_words['post']), $split_text); 
         $words['del']['title'] = array_diff(array_keys($cur_words['title']), $split_title); 

gravyplaya
Registered User
Posts: 15
Joined: Sun Jun 22, 2003 7:29 pm

Post by gravyplaya » Wed Oct 15, 2003 9:33 pm

Ok I resolved the problem by just pruning down the amount of topics in my forums. was over 50k topics/posts now is only 3600.. This can be a problem if the forum grows back to the big size. Surely this may be a bug. Oh changing the memory size in my php.ini does NOT work. as a matter of fact I changed it from 8M to 16M and it crashed my php completely(no php files were parsable, just got a blank page).

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Wed Oct 15, 2003 10:55 pm

Hi Deriel, the warnings came from empty results from the query to search for existing words in the search table. Guess it depends if the message being edited was previously indexed or not.

To avoid the warning message, I changed the following:

Code: Select all

            $words['add']['post'] = array_diff($split_text, array_keys($cur_words['post']));
            $words['add']['title'] = array_diff($split_title, array_keys($cur_words['title']));
            $words['del']['post'] = array_diff(array_keys($cur_words['post']), $split_text);
            $words['del']['title'] = array_diff(array_keys($cur_words['title']), $split_title);
To:

Code: Select all

            if ( count($cur_words) ) {
    			$words['add']['post'] = array_diff($split_text, array_keys($cur_words['post']));
    			$words['add']['title'] = array_diff($split_title, array_keys($cur_words['title']));
    			$words['del']['post'] = array_diff(array_keys($cur_words['post']), $split_text);
    			$words['del']['title'] = array_diff(array_keys($cur_words['title']), $split_title);
            } else {
                $words['add']['post'] = $split_text;
                $words['add']['title'] = $split_title;
                $words['del']['post'] = array();
                $words['del']['title'] = array();
            }
That should do the trick.

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Wed Oct 15, 2003 11:06 pm

gravyplaya wrote: Ok I resolved the problem by just pruning down the amount of topics in my forums. was over 50k topics/posts now is only 3600.. This can be a problem if the forum grows back to the big size. Surely this may be a bug. Oh changing the memory size in my php.ini does NOT work. as a matter of fact I changed it from 8M to 16M and it crashed my php completely(no php files were parsable, just got a blank page).


Hello Gravyplaya, did you got to do the tweak I mentioned on the first page for viewtopic.php? That should speed up page views with large threads.

As for the pagination code, just do a text search for "function pagination" under the functions.php file and you should see it. First delete the original function, then paste the new one in. Remember to make a backup first!

As for the error you mentioned, is it related to the pagination code or is it something you've been having? And which function is line 568 inside your functions.php file? Installed any mods lately?

romans1423
Registered User
Posts: 1552
Joined: Sat Nov 02, 2002 4:44 pm
Location: Connersville, IN
Name: Rick Beckman
Contact:

Post by romans1423 » Thu Oct 16, 2003 4:31 am

Put these tweaks in MOD format and submit 'em to the database, and you'll probably get a lot more people testin' them out for you, not to mention the MOD team. The instructions will be easier to follow too; of course, you'll have to make use of language files and such. :)

cyphernutt
Registered User
Posts: 14
Joined: Thu Oct 16, 2003 6:41 am

details.......

Post by cyphernutt » Thu Oct 16, 2003 7:45 am

Could you expand on the details for altering the script?

*Outofmymindyo says stop spamming*

User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Post by drathbun » Thu Oct 16, 2003 12:08 pm

I want to say that I've read this topic with great interest. While I'm nowhere near the size of the Gaia forums, anyone striving to get the best performance from possibly "meager" hardware should be interested in this tips. Keep 'em coming. 8)
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image

Deriel
Registered User
Posts: 164
Joined: Tue Aug 27, 2002 5:16 pm
Location: Curitiba/PR - Brasil
Contact:

Post by Deriel » Thu Oct 16, 2003 1:29 pm

The tweaks are incredible. I've only a 650 k posts, 1300 posts/day Forum and the search really help. I'm studying the others (and implanting it).

Very kindly from lanzer, a very busy Admin (I read the interview :D ), to give us so many attention.

:D

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Thu Oct 16, 2003 9:01 pm

romans1423 wrote: Put these tweaks in MOD format and submit 'em to the database, and you'll probably get a lot more people testin' them out for you, not to mention the MOD team. The instructions will be easier to follow too; of course, you'll have to make use of language files and such. :)


Hello Romans1423, thank you for your suggestion. Unfortunately at this time I'm too swampped with work and posting little threads like these is all I can afford to do. Image

If anyone is interested in making mods out of any of the tweaks mentioned, please feel free to do so. Whoever submit them as mods can take full credit, for I just wanted to share tips and ideas with the community. Whichever way get the point across I'll be happy with. :)

Though please contact me before making mod versions so we can co-ordinate efforts between authors doing the same thing.

Cheers,
Lanzer

BartVB
Consultant
Consultant
Posts: 1288
Joined: Thu Aug 02, 2001 1:32 pm
Location: The Netherlands
Contact:

Post by BartVB » Sun Nov 02, 2003 1:22 pm

Nice tweaks, Lanzer!

I also ran into the problem with long threads (>3000 replies etc). At the moment my mods are just locking topics with more than 2500 replies (100 pages) ;) Works for me.
The main problem with the viewtopic query is that it sorts in ascending order, so MySQL produces a resultset with all 3000 replies, skips 2975 posts and fetches the last 25 posts. If you just change 'ORDER BY post_id ASC' to 'DESC' they you already achieve a huge performance gain. Note that you will have to output the results from the query in reverse order!

99.95% of the time people read the lastest pages in a topic, people reading the first post in a large topic will have to face the large delay (but indeed, you could solve that by using 'ASC' when start < topic_replies/2. My solution does approx the same as your solution but it's easier to implement ;) IIRC this is will also included in 2.2 (but don't pin be down on that because that code has been very much in flux lately).

Regarding the search.. IMO SQL based search is really nice for small boards, you don't need extra software, not writeable files, etc, etc but it's very inefficient :/

Anyway, at the moment I'm still running with the SQL search system but I've modified it a bit. The largest problem with the search system occurs when you edit/delete a post, this can be a _very_ timeconsuming process.

So I've removed the code that alters the search tables from the phpBB code. Instead of altering the search the posting.php script just marks a posting as new/edited/deleted. Then at night (in a cronjob) a PHP script runs that deletes all posts that are new/edited in one go, this can be done pretty swiftly. After that it adds all posts that are new/edited. But this solution only works if your board is close to idle during the night (mine is because it's not an international board).

But this is only a temporary solution, the search results that are produced by the search system are not complete most of the time, the users seem to be having quite a bit of trouble with finding the right topics. And search queries can take quite a bit of time to run, especially when doing silly searches with a lot of OR which can really hamper performance of the site...

Because of all that I'm looking at http://www.xapian.org/ which should scale a LOT better than the SQL solution. Only problem is that I currently don't have the time to actually implement Xapian as a search engine for the board ;( Pretty busy with the portal, other phpBB 2.2 developement, work, school and my girlfriend kind of appreciates it if I sometimes talk to her face to face instead of through Jabber ;) So if someone is interested in implementing a Xapian search for phpBB: Please! I would really appreciate it :D

BTW, Lanzer, are you working with MyISAM or InnoDB tables?
I Hate oversized sigs and Love Penguins :D

User avatar
SHS`
Former Team Member
Posts: 6615
Joined: Wed Jul 04, 2001 9:13 am
Location: Yellow Beach, Nine Dragons, Hong Kong
Name: Jonathan Stanley
Contact:

Post by SHS` » Sun Nov 02, 2003 5:03 pm

BartVB wrote: BTW, Lanzer, are you working with MyISAM or InnoDB tables?


I think I can answer that one, if memory serves from one of the old posts... it's InnoDB, hence why there wasn't the total post count on the index as the count function made index.php very slow. I believe it's show now as it's just a static value in phpbb_config (assumption) that get's updated as and when needed, not too frequently.
Jonathan “SHS`” Stanley • 史德信
Image

davidh44
Registered User
Posts: 386
Joined: Sat Mar 09, 2002 5:56 am

Post by davidh44 » Mon Nov 03, 2003 8:42 am

lanzer wrote: I actually haven't written anything to prune the searchword tables yet. I'll post my purning scripts once I have them installed and tested.


Very interesting modifications lanzer! :D Thanks for sharing them with us. You really know your php and mysql.

My main issue with the search table has been that it isn't pruned when the posts are pruned. So i end up with a search table that is way bigger than the post table itself, and once ended up simply deleting the search table and trying to reindex (which ran forever and I finally gave up on that).

If I'm reading correctly, it sounds like you may be working on a solution for this. Which would be great. :D

jbay
Registered User
Posts: 18
Joined: Tue Apr 02, 2002 2:27 pm
Contact:

Post by jbay » Mon Nov 03, 2003 11:47 am

just wanted to add my thanks, as well, for the tweaks; I've got a few of them running.

I'm running innodb on a few selected tables on our system, and it seems to help a lot (though on other tables it seems to cause some interesting errors...like not actually inserting new users into the usertable when they sign up).

Taking out the sum() functions in get_db_stats and changing it to just give me the last post/topic # seemed to speed things up quite a bit (actually I don't see where the count of topics in that function are used anywhere in the code...looks like needless overhead from what I can tell)

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Sat Nov 08, 2003 1:20 pm

Hello BartVB:

Thanks for the compliment, first of all. :oops:
BartVB wrote: I also ran into the problem with long threads (>3000 replies etc). At the moment my mods are just locking topics with more than 2500 replies (100 pages) ;) Works for me.
The main problem with the viewtopic query is that it sorts in ascending order, so MySQL produces a resultset with all 3000 replies, skips 2975 posts and fetches the last 25 posts. If you just change 'ORDER BY post_id ASC' to 'DESC' they you already achieve a huge performance gain. Note that you will have to output the results from the query in reverse order!


Yeah, after the first hack for gathering a post_id index before fetching post text, I made an additional index for the post table which consist of both the topic id and post_id. (page 1 of this thread) That way MySQL can search the posts table fast and doesn't scroll through tons of posts to find the latest messages. Worked like a champ.

As I personally believe that technical issues can be conquered by optimizations, so we haven't limited our users in any activities due to technical reasons. Our largest topic now has more than 150,000 messages. Thank god for reverse index search. :D
Regarding the search.. IMO SQL based search is really nice for small boards, you don't need extra software, not writeable files, etc, etc but it's very inefficient :/


Yeah. What I'll be doing next week is to start a new DB server and have that machine handle long and slow queries like searching. The main server will only be executing queries that takes less than 0.02 seconds to execute.
So I've removed the code that alters the search tables from the phpBB code. Instead of altering the search the posting.php script just marks a posting as new/edited/deleted. Then at night (in a cronjob) a PHP script runs that deletes all posts that are new/edited in one go, this can be done pretty swiftly. After that it adds all posts that are new/edited. But this solution only works if your board is close to idle during the night (mine is because it's not an international board).


Unfortunately I don't get that luxury. It's 4:40am right now and we have 700 users on-line. Not to mention that we get about 200k posts a day now so it would be a rather daunting task to try to squeeze that task into one job. :( I have yet to go deep into finding any solution that is both comprehensive and quick. Had to end up crippling most of the functionality to make this work... I'm in the same shoe where we have much more pressing functionalities and features to implement, and so the search feature will just have to be placed in the lower part of the feature request list. I will look at xapian for personal interest though. Never heard of it before.
BTW, Lanzer, are you working with MyISAM or InnoDB tables?


Right now basically anything that gets updated in less than a second is running InnoDB:

users table
topics table
search wordlist
search results
privmsgs table and text
posts table and text
forums table

The rests are still MyISAM since they are so much faster to scan.

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Sat Nov 08, 2003 2:00 pm

Hello jbay, I'm surprised that innodb gave you problems with adding new users to the users table. Hopefully you guys have that problem taken care of.

In terms of handling anything with count(*) or sum(), I either went with storing a counter that gets counted every minute in a cron job. I used cheat with using "explain select" to find a relatively close value. :)
jbay wrote: just wanted to add my thanks, as well, for the tweaks; I've got a few of them running.

I'm running innodb on a few selected tables on our system, and it seems to help a lot (though on other tables it seems to cause some interesting errors...like not actually inserting new users into the usertable when they sign up).

Taking out the sum() functions in get_db_stats and changing it to just give me the last post/topic # seemed to speed things up quite a bit (actually I don't see where the count of topics in that function are used anywhere in the code...looks like needless overhead from what I can tell)

Locked

Return to “2.0.x Discussion”