[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
joe7
Registered User
Posts: 9
Joined: Wed Jan 19, 2005 4:08 pm

indexes

Post by joe7 » Wed Jun 29, 2005 2:46 am

Just readed all pages of this thread...

Would anyone please help me sum up what indexing would help for me if my site is just a small one?..
( i guess it is, as it has only 500 regged users, no more than 50 are active at the same time..35.000 posting at all. 15.000+page access and 100.000+hits per day)
Main problem is not with db side i guess..cause a tipical pagegentime looks like this:
(index) [ Time: 0.3472s ][ Queries: 9 (0.047s) ][ GZIP enabled ]
I have quite a lot of mods installed..
or maybe should i adjust something in phpconfig?..

still sometimes phpbb appears in my mysql-slow.log as it can been seen here: http://www.phpbb.com/phpBB/viewtopic.ph ... 74#1638874

Thanks for anything in advance

Letus2001
Registered User
Posts: 22
Joined: Thu Nov 04, 2004 8:22 pm

Post by Letus2001 » Fri Jul 01, 2005 1:48 pm

Hi all,
this is amazing topic, I have to say ! You all made an amazing work !
I run phpBB basically as wiki - place, where people create some kind of encyklopaedia about history. Not so many posts (100k), not so many users (4k, pruned regulary), but huge amount of categories (20k). And this causes me a big headache (I already implemented the great tweaks mentioned here, some of them, helped me a lot !!!). Most of the topic view pages are quick, but the category tree is huge and the index page is taking its time to load (well, the result page is about 300kb big). I wonder, if anyone has experience with optimising Pthriik's subcategory code to speed up his forum, or if Lanzer can share his ideas (I know you have been talking about this, but no one seemed to be interested in this. I'm :D). The site is under heavy "read" traffic, serving about 5.000.000 pages a month, so I really need to tune this up.
Thanx !

Saoshyant
Registered User
Posts: 77
Joined: Thu Feb 03, 2005 3:35 pm
Location: Portugal

Post by Saoshyant » Sun Jul 10, 2005 10:38 pm

I know I'm being silly for making this, but I would recommend that if someone is still looking into optimizing and tweaking phpbb2, they should look into the includes/smtp.php. This script makes our forum time out every time there's more than 1 person simultaneously trying to register, making it impossible to even know if the account was registered succesfully or not, because it times out before going to the Success screen. This is quite annoying and the acts it does overload the php execution must mean--as far as I see--that the function is not as optimized as it should be. So hey, it can be a funny exercise for those looking over tweaking every little aspect of their huge forums.

User avatar
EverettB
Registered User
Posts: 326
Joined: Fri Aug 01, 2003 7:11 pm
Location: North America

Post by EverettB » Tue Jul 12, 2005 5:57 pm

Since no one responded yet...

I think you have something wrong with your server or your SMTP configuration. My site sends out up to 2,000 emails/hour using smtp.php with no troubles.

User avatar
xkevinx
Registered User
Posts: 132
Joined: Tue Nov 05, 2002 8:45 pm
Location: California
Contact:

Post by xkevinx » Wed Jul 13, 2005 12:04 am

Hi lanzer,

I fould this neato script that would be really interesting to see run on your system. Heck you may be running it already. it's a stats program. If not this one what on are you using if any.

http://cacti.net/

And how many servers are you at now lol.

EDIT: any luck finding a solution to the search feature.

Kevin (Fusionx at gaia)
Do you Believe?

Serberus
Registered User
Posts: 3
Joined: Thu Jul 14, 2005 8:58 pm

Re: Searching improvements for larger forums.

Post by Serberus » Thu Jul 14, 2005 9:22 pm

barbos wrote:

Code: Select all

	if ( !empty($stopword_list) )
	{
		for ($j = 0; $j < count($stopword_list); $j++)
		{
			$stopword = trim($stopword_list[$j]);

			if ( $mode == 'post' || ( $stopword != 'not' && $stopword != 'and' && $stopword != 'or' ) )
			{
				$entry = str_replace(' ' . trim($stopword) . ' ', ' ', $entry);
			}
		}
	}

	if ( !empty($synonym_list) )
	{
		for ($j = 0; $j < count($synonym_list); $j++)
		{
			list($replace_synonym, $match_synonym) = split(' ', trim(strtolower($synonym_list[$j])));
			if ( $mode == 'post' || ( $match_synonym != 'not' && $match_synonym != 'and' && $match_synonym != 'or' ) )
			{
				$entry =  str_replace(' ' . trim($match_synonym) . ' ', ' ' . trim($replace_synonym) . ' ', $entry);
			}
		}
	}

After: (replace with) includes/functions_search.php

Code: Select all

	if ( !empty($stopword_list) )
	{
		for ($j = 0; $j < count($stopword_list); $j++)
		{
			$stopword = trim($stopword_list[$j]);

			if ( $mode == 'post' || ( $stopword != 'not' && $stopword != 'and' && $stopword != 'or' ) )
			{
				$entry = str_replace(' ' . trim(strtolower($stopword)) . ' ', ' ', $entry);
			}
		}
	}

	if ( !empty($synonym_list) )
	{
		for ($j = 0; $j < count($synonym_list); $j++)
		{
			list($replace_synonym, $match_synonym) = split(' ', trim(strtolower($synonym_list[$j])));
			if ( $mode == 'post' || ( $match_synonym != 'not' && $match_synonym != 'and' && $match_synonym != 'or' ) )
			{
				$entry =  str_replace(' ' . trim(strtolower($match_synonym)) . ' ', ' ' . trim(strtolower($replace_synonym)) . ' ', $entry);
			}
		}
	}


Nice work barbos. Does the code below offer any performance improvement?

Code: Select all

	if ( !empty($stopword_list) ) {
		foreach ($stopword_list as $stopword) {
			$stopword = strtolower(trim($stopword));

			if ( $mode == 'post' || !in_array($stopword, array('not','and','or')) )
				$entry = str_replace(" $stopword ",' ',$entry);
		}
	}

	if ( !empty($synonym_list) ) {
		foreach ($synonym_list as $synonym) {
			list($replace_synonym, $match_synonym) = explode(' ', trim(strtolower($synonym)));
			$replace_synonym = strtolower($replace_synonym);
			$match_synonym   = strtolower($match_synonym);

			if ( $mode == 'post' || !in_array($match_synonym, array('not','and','or')) )
				$entry = str_replace(" $match_synonym ", " $replace_synonym ", $entry);
		}
	}

Serberus
Registered User
Posts: 3
Joined: Thu Jul 14, 2005 8:58 pm

Post by Serberus » Fri Jul 15, 2005 10:01 am

ms2scale wrote: My relatively large board starts do crawl during some queries.
The processlist shows, that the slow queries hang in
"Copying to tmp tables".
I'm using mysql 3.23.49 utilizing MyIsam tables.
I already checked lanzers mysql config but start getting confused with
the mysql server variables, especially the ones regarding tmp tables.
Any clues or example setups?

Cheers,

Michael


The tmp_table_size variable in MySQL dictates the size of temporary table buffer. These temporary tables are typically generated by queries using GROUP BY and/or ORDER BY statements.

MySQL seems to be very 'hungry' when it comes to this buffer, (on the site I work on it requires more memory than the key buffer).

I'm looking at monitoring the temporary tables MySQL produces ('SQL_*' files in the specific TMPDIR) to gauge what size I should make this buffer.

This can be done on Linux using dnotify and execing something like "ls -l >> tmp_table_size.log" (very simplistic monitoring).

You can normally tell if this buffer is too small by checking the SHOW STATUS variables "Created_tmp_disk_tables" and "Created_tmp_tables" - divide the disk tables into the tmp tables to see what percentage of temporary tables are going to disk because this buffer is too small.

One thing to note is that MySQL always reverts to disk based tmp tables for queries which select BLOB (and TEXT I think) columns. So if you see no change in your ratio of disk based tmp tables after substantially increasing your tmp_table_size it's likely the queries going to disk use columns of this type.

Hope this helps.

User avatar
EverettB
Registered User
Posts: 326
Joined: Fri Aug 01, 2003 7:11 pm
Location: North America

Post by EverettB » Fri Jul 15, 2005 7:11 pm

You can normally tell if this buffer is too small by checking the SHOW STATUS variables "Created_tmp_disk_tables" and "Created_tmp_tables" - divide the disk tables into the tmp tables to see what percentage of temporary tables are going to disk because this buffer is too small.

If it helps you compare, the ratio on my site is 5%.

I don't know if that is optimal but after some Google searches, it seems people were having trouble when the ratio was 40-50%.

I found a recommendation to increase the tmp_table_size if the ratio was over 10%.

You can change this value by editing your my.cnf file for mySQL and entering

Code: Select all

set-variable = tmp_table_size = 32M
Restart mySQL afterward

I believe 32M is the default.

User avatar
KobYY
Registered User
Posts: 46
Joined: Thu Jun 23, 2005 8:50 pm

Suggestion

Post by KobYY » Sat Jul 16, 2005 11:28 am

I have Suggestion why you didnt make any file any script pack with all improvements,with all tweaks.
Because for newbie is to hard!!!!!!

Lady Serena
Registered User
Posts: 76
Joined: Thu May 26, 2005 12:31 am
Location: Smithsburg, MD
Contact:

Post by Lady Serena » Mon Jul 18, 2005 11:37 pm

I'm building and installing the Xapian search system (posted way earlier ago, a page toward the beginning of this thread). I'm going to be doing some testing on it, and I'll post the results of the test. I will also report on its performance and how it may perform on larger boards.
Varus Online :: Discover Imagination
http://www.varusonline.com/

alphamonkey
Registered User
Posts: 146
Joined: Sat Mar 01, 2003 8:26 am
Location: 0x00

Post by alphamonkey » Tue Jul 19, 2005 7:21 pm

Lady Serena wrote: I'm building and installing the Xapian search system (posted way earlier ago, a page toward the beginning of this thread). I'm going to be doing some testing on it, and I'll post the results of the test. I will also report on its performance and how it may perform on larger boards.
I'd actually be really interested to see how this performs. I hope the test goes well.

Serberus
Registered User
Posts: 3
Joined: Thu Jul 14, 2005 8:58 pm

Post by Serberus » Wed Jul 20, 2005 10:16 pm

EverettB wrote:
You can normally tell if this buffer is too small by checking the SHOW STATUS variables "Created_tmp_disk_tables" and "Created_tmp_tables" - divide the disk tables into the tmp tables to see what percentage of temporary tables are going to disk because this buffer is too small.

If it helps you compare, the ratio on my site is 5%.

I don't know if that is optimal but after some Google searches, it seems people were having trouble when the ratio was 40-50%.

I found a recommendation to increase the tmp_table_size if the ratio was over 10%.

You can change this value by editing your my.cnf file for mySQL and entering

Code: Select all

set-variable = tmp_table_size = 32M
Restart mySQL afterward

I believe 32M is the default.


5% isn't bad, obviously one in twenty queries requiring temp tables are going to disk.

I can easily see why ppl are having trouble at 40 - 50%!

At the moment on my site the ratio is 25% and I've increased the tmp_table_size from 256MB to 512MB with no change in this ratio.

I'm a little lost as to whether this is due to too many queries using BLOBs/TEXTs (images are currently served from the DB - this is going to change soon however!) or if there are some heavy queries generating huge temp tables.

The other problem is the MySQL server is running on Solaris (8 ) and I'm yet to find a dnotify equivilent.

Going slightly OT....

Call me a nerd but reading one of alphamonkey's posts about his current set up (which looks supoib! :wink:) and using memcache caught my attention, this technology certainly looks exciting. This is exactly what I've been looking for; I've been caching some frequently executed query results in the users session but memcache looks a whole lot better. I can't believe I haven't come across this before!

The only problem I have is convincing the server guys to set this up. As an interim solution one of my work colleagues suggested using a heap table to hold my cache. I couldn't see any disadvantages to this (apart from it's still using MySQL to service cached requests).

The set up my site runs off is 4 web servers (load balanced) and 1 DB server, moving this cache to the DB server allows all 4 web servers to use the same cache and more importantly:

1) This cache doesn't have the limited life span of a user session.

2) This cache is pooled between all web servers and doesn't duplicate the same results for each client.

Have I missed any critical disadvantages to this caching system?

gulson
Registered User
Posts: 15
Joined: Mon May 13, 2002 8:45 pm

Post by gulson » Thu Jul 21, 2005 9:25 pm

I have read a lot about problems (heavy load) phpbb search engine. I saw many seconds are waste for update search engine tables. This is happening when someone send new message/reply also when message are removed. So when user is sending new message - he must wait few seconds - sometimes need more seconds or information about stored message is never displayed... after that we have multiple the same posts.. Lanzer had an idea about make tmp table. This table should keep information what changes should be made in phpbb_search tables. So changes in search tables are making only twice per day when not many users are online. Maybe someone can say more about this idea or maybe release what code changes and which files should be edited and what rows created... Thank you.

User avatar
KobYY
Registered User
Posts: 46
Joined: Thu Jun 23, 2005 8:50 pm

Post by KobYY » Thu Jul 21, 2005 11:39 pm

Lady Serena wrote: I'm building and installing the Xapian search system (posted way earlier ago, a page toward the beginning of this thread). I'm going to be doing some testing on it, and I'll post the results of the test. I will also report on its performance and how it may perform on larger boards.

ok thnx

R45
Registered User
Posts: 2830
Joined: Tue Nov 27, 2001 10:42 pm

Post by R45 » Sun Aug 07, 2005 2:46 pm

gulson wrote: I have read a lot about problems (heavy load) phpbb search engine. I saw many seconds are waste for update search engine tables. This is happening when someone send new message/reply also when message are removed. So when user is sending new message - he must wait few seconds - sometimes need more seconds or information about stored message is never displayed... after that we have multiple the same posts.. Lanzer had an idea about make tmp table. This table should keep information what changes should be made in phpbb_search tables. So changes in search tables are making only twice per day when not many users are online. Maybe someone can say more about this idea or maybe release what code changes and which files should be edited and what rows created... Thank you.

I wrote up something a long while ago to do this and Darth_Wong provided an instruction file.

See http://www.stardestroyer.net/Mike/GeekT ... _1_0_0.zip

It requires you to setup a cronjob to process the pending search index updates at an interval of your pleasure.

Locked

Return to “2.0.x Discussion”