[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
alphamonkey
Registered User
Posts: 146
Joined: Sat Mar 01, 2003 8:26 am
Location: 0x00

Post by alphamonkey » Sat May 15, 2004 7:46 pm

Darn you and your logic. :wink:

I was thinking more of converting it explicitly to a left join. Unfortunatly people are having a hard time reading my mind today :D If the query is still slow you could split it in two. Though I think the overhead of a second call might be worse in this situation.

Daijoubu
Registered User
Posts: 64
Joined: Wed Aug 21, 2002 4:31 pm
Location: Montreal QC Canada
Contact:

Post by Daijoubu » Sat May 15, 2004 9:23 pm

Or add the right indexes :)
Image

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Sun May 16, 2004 9:53 am

gulson wrote: anyone knows how to tweak attachment mod?
especially this query:


Yup, I'm betting that the slowness is caused by a lack of indexing too. :?

Just run mysql from the command line and run:

create index attach_id on phpbb_attachments_desc (attach_id);

Just finished another site update. Starting work on a message archiving system for phpbb which will move old posts into another database that's transparent to the user. Will post results when I'm finished with it. Hopefully it'll take less than a week...

alphamonkey
Registered User
Posts: 146
Joined: Sat Mar 01, 2003 8:26 am
Location: 0x00

Post by alphamonkey » Sun May 16, 2004 4:12 pm

lanzer wrote: Just finished another site update. Starting work on a message archiving system for phpbb which will move old posts into another database that's transparent to the user. Will post results when I'm finished with it. Hopefully it'll take less than a week...
Hmmm. Im looking forward to that :P

Fearless
Registered User
Posts: 664
Joined: Mon Mar 31, 2003 6:16 am

Post by Fearless » Tue May 18, 2004 1:00 am

All these tweaks are nice. Are there any optimisations you can carry out on a board even if it doesn't have many users/posts ? I heard it's possible to rebuild the search table while removing some words using an exclusion list.

DavG
Registered User
Posts: 10
Joined: Tue Nov 18, 2003 3:39 pm
Location: Québec
Contact:

Re: phpBB tweaks for large forums

Post by DavG » Wed May 19, 2004 2:34 pm

lanzer wrote: - Viewtopic.php

Code: Select all

$sql = "SELECT p.post_id FROM " . POSTS_TABLE . " p WHERE p.topic_id = $topic_id $limit_posts_time LIMIT $start, " . $board_config['posts_per_page'];
if ( !($result = $db->sql_query($sql)) )
{
	message_die(GENERAL_ERROR, "Could not obtain post index information.", '', __LINE__, __FILE__, $sql);
}
while (list($p_id) = $db->sql_fetchrow($result)) {
    $p_array[] = $p_id;
}
$post_index = implode(",",$p_array);
A list of post_id are obtained, now I slip it into the large query

Code: Select all

$sql = "SELECT u.username, u.user_id, u.user_posts, u.user_from, u.user_website, u.user_email, u.user_icq, u.user_aim, u.user_yim, u.user_regdate, u.user_msnm, u.user_viewemail, u.user_rank, u.user_sig, u.user_sig_bbcode_uid, u.user_avatar, u.user_session_time, u.user_allow_viewonline, u.user_allowsmile, p.*,  pt.post_text, pt.post_subject, pt.bbcode_uid
	FROM " . POSTS_TABLE . " p, " . USERS_TABLE . " u, " . POSTS_TEXT_TABLE . " pt
	WHERE p.post_id in ($post_index)
		AND pt.post_id = p.post_id
		AND u.user_id = p.poster_id
	ORDER BY p.post_time $post_time_order";

Running this on my forum gives the following error :

Code: Select all

SQL Error : 1064 You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near ') AND pt.post_id = p.post_id AND u.user_id = p.poster_id 

SELECT u.username, u.user_id, u.user_posts, u.user_from, u.user_website, u.user_email, u.user_icq, u.user_aim, u.user_yim, u.user_regdate, u.user_msnm, u.user_viewemail, u.user_rank, u.user_sig, u.user_sig_bbcode_uid, u.user_avatar, u.user_session_time, u.user_allow_viewonline, u.user_allowsmile, p.*, pt.post_text, pt.post_subject, pt.bbcode_uid FROM phpbb_posts p, phpbb_users u, phpbb_posts_text pt WHERE p.post_id in (,) AND pt.post_id = p.post_id AND u.user_id = p.poster_id ORDER BY p.post_time ASC

Line : 412
File : phpBB2\viewtopic.php
DB is mysql 4.0.15-log, any idea why $post_index seems to be empty ?
Knowledge is acquired by the experiment, all the remainder is only of information. [Albert Einstein]

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Wed May 19, 2004 10:07 pm

Hi DavG:

Can be a lot of things, most likely something to do with the code itself. Feel free to PM me with the part of the code which you added in your viewtopic.php file and I can definitely help you take a look at it.

For debugging's sake, you can slip in a line like "print "$sql<BR>"; just so you see what your statement looks like, that should help you out also.

Good luck!

da_badtz_one
Registered User
Posts: 376
Joined: Thu Jan 29, 2004 8:25 pm

Post by da_badtz_one » Wed May 19, 2004 10:36 pm

Fearless wrote: All these tweaks are nice. Are there any optimisations you can carry out on a board even if it doesn't have many users/posts ? I heard it's possible to rebuild the search table while removing some words using an exclusion list.


Well there's some mods out there that can lower the queries (server load) of the forum. Like the gzip mod or removing some useless functions in phpBB.

Normally the phpBB is already very fast but your server needs to be fast in order to have a fast board. Say your in dial up. You can only get speeds of about 3KB/s max for uploading. :)

Florian_dvp
Registered User
Posts: 10
Joined: Tue Feb 25, 2003 3:24 pm
Location: France
Contact:

Post by Florian_dvp » Tue May 25, 2004 8:51 am

such a huge site, I'm really impressed 8O

I was too lazy to read all the thread, but i've something to share about the search part.
I had trouble whith search locking posting. So to avoid locking, i copied the search tables : the "original" tables are used for insert/suppress, and the copy for the search. The copy is updated by a cron job at night times. It introduces a "lag" of one day before changes are avalaible, but i really gained from this.
The mod is easy: duplicating tables with a different name and changing it in search.php
The drawback is that it consumes space, but my board is much smaller than gaia :wink: .
I have a dedicated server for my forum.
"In omnite, there's no single kick, only combos exist".

R45
Registered User
Posts: 2830
Joined: Tue Nov 27, 2001 10:42 pm

Post by R45 » Wed Jun 02, 2004 3:45 am

Found a small annoying issue in the Topic Splitting (affects there atleast), which has to do with the way the censored words are retrieved. In the split post screen, it calls the obtain_word_list function for each post, which executes the query

Code: Select all

SELECT word, replacement FROM phpbb_words;
each time. Therefore, splitting a topic with 100 posts would have that query executed 100 times, a tad useless. I just did a cheap hack to the obtain_word_list() function to cache the data. Open functions.php, find the whole function obtain_word_list declaration, replace it with below

Code: Select all

function obtain_word_list(&$orig_word, &$replacement_word)
{
	global $db;
	static $s_orig_word, $s_replacement_word;
	
	if(!defined('PHPBB_WORDS_SELECTED'))
	{
		//
		// Define censored word matches
		//
		$sql = "SELECT word, replacement
			FROM  " . WORDS_TABLE;
		if( !($result = $db->sql_query($sql)) )
		{
			message_die(GENERAL_ERROR, 'Could not get censored words from database', '', __LINE__, __FILE__, $sql);
		}
	
		if ( $row = $db->sql_fetchrow($result) )
		{
			do 
			{
				$s_orig_word[] = '#\b(' . str_replace('\*', '\w*?', phpbb_preg_quote($row['word'], '#')) . ')\b#i';
				$s_replacement_word[] = $row['replacement'];
			}
			while ( $row = $db->sql_fetchrow($result) );
		}
		
		define('PHPBB_WORDS_SELECTED', true);
	}
	
	$orig_word = $s_orig_word;
	$replacement_word = $s_replacement_word;

	return true;
}
This is a minor issue, but annoying nonetheless.

Florian_dvp
Registered User
Posts: 10
Joined: Tue Feb 25, 2003 3:24 pm
Location: France
Contact:

Post by Florian_dvp » Thu Jun 10, 2004 1:37 pm

The tweak for the index page is very easy to use, and gives a really efficient result...
Thank you for these tips :D
"In omnite, there's no single kick, only combos exist".

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Message archiving

Post by lanzer » Fri Jun 11, 2004 11:26 am

As our board kept growing in size our forum started having a problem with the size of our table. Though pages does seek quite fast with the hacks made, ultimately if a user tries to pull old messages out of a really long thread we're still looking at long seek times. Another problem is that the messages are taking up too much disk space. I think that indexing the large posts and posts text tables must take up a good chunk of memory also.

My solution to which is an archiving server solution. A seperate server built with slower SATA drives that can hold a terrabyte of info as opposed to our current 64GB logical drive. The archiving server also has unique indexing option to make its row seeks much faster on fetching long threads. The theory behind this system will be explained below for those who are interested.

First of all, one flexibility which will be lost on the archiving server is the ability to delete individual posts that are archived. Reason for which will be explained later, but doing so will also keep the table more optimized.

To start archiving first, 2 columns are added to the topic table:

Archive_id - This value indicates how many posts in this thread is archived
Archive_time - When was the post last archived

When posts are moved to the archiving database, within every thread, an index by the name of "page_count" which increments every 300 posts is created. For example, if a thread has 10000 posts, the first three hundred posts made will have a page count of 1, the 301 to 600 posts will have a page count of 2, etc.

What end up happening is that when a user looks for post # 300,000 in a thread that is 600,000 posts long, typically the query goes:

Code: Select all

SELECT post_id FROM phpbb_posts WHERE topic_id = blah LIMIT 300000, 15
The database will need to count 300,000 rows before finding the post indexes.

Now with the introduction of the page number, the script will first find out which page number your post is located in by dividing the post number by 300... which is 1000. Then the query will go something like:

Code: Select all

SELECT post_id FROM phpbb_posts WHERE topic_id = blah AND page_number = 1000 LIMIT  15
This works because there is an compound index that contains both the topic_id and the page number. The result is a query that will only do row counts of less than 300 rows.

There are tons of details to sort out, such as the fact the archiving script should not archive posts that are new, and other scripts such as thread splitting, deletion, syncing, all have to support the archiving server. The details are too involved that I can't list them all out in a page. :? Though this is pretty much the way my archiving system is setup at the moment.

Hope this is an interesting read to all. :) My next project is to make use of a memory cache server to hold session information instead of using a heap table, but that's pretty much way off topic for phpbb. If there are any other interesting projects relating to forums I'll be sure to let everyone know. Until then!

dprichard
Registered User
Posts: 6
Joined: Sun Jun 27, 2004 1:08 pm
Location: Tampa Florida
Contact:

Post by dprichard » Mon Jun 28, 2004 12:39 pm

Has anyone used any search engine friendly mods? Do you find these will slow down the sites? Is there a list anywhere of safe mods to use for PHPBB?

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Fun with compound indexes

Post by lanzer » Wed Jun 30, 2004 11:06 am

Doing the usual slow query sweep today when I noticed something that's running slow:

(from doing a show processlist command, I spotted a query that's been running for 5 seconds)

Code: Select all

| 158188363 | anihq | 192.168.1.2:44763                         | anihq | Query   | 5    | Sending data   | SELECT vote_id FROM bb_vote_voters WHERE vote_id = 586879 AND vote_user_id = 248036        |
Anyways, on that particular command:

Code: Select all

SELECT vote_id FROM bb_vote_voters  WHERE vote_id = 586879 AND vote_user_id = 248036;
+---------+
| vote_id |
+---------+
|  586879 |
+---------+
1 row in set (1.67 sec)
No queries should take more than a second to execute, especially under moderate load.

An explain shows that MySQL is using the key user_id to look for all votes made by the user,
then seeing if the vote_id (poll) had been cast for that user:

Code: Select all

explain  SELECT vote_id FROM bb_vote_voters  WHERE vote_id = 586879 AND vote_user_id = 248036;
+----------------+------+----------------------+--------------+---------+-------+------+-------------+
| table          | type | possible_keys        | key          | key_len | ref   | rows | Extra       |
+----------------+------+----------------------+--------------+---------+-------+------+-------------+
| bb_vote_voters | ref  | vote_id,vote_user_id | vote_user_id |       3 | const | 2020 | Using where |
+----------------+------+----------------------+--------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
In this case the user had made 2020 votes (as stated in the rows column), and it took 1.67 seconds to
check and see if vote_id of 586879 exist within the 2020 vote_id's found.

So a compound index is made:

Code: Select all

create index vote_n_user on bb_vote_voters (vote_user_id, vote_id);
Note that MySQL is using the compound index and the same query can pinpoint to only one row now. :)

Code: Select all

explain  SELECT vote_id FROM bb_vote_voters  WHERE vote_id = 586879 AND vote_user_id = 248036;
+----------------+------+----------------------------------+-------------+---------+-------------+------+--------------------------+
| table          | type | possible_keys                    | key         | key_len | ref         | rows | Extra                    |
+----------------+------+----------------------------------+-------------+---------+-------------+------+--------------------------+
| bb_vote_voters | ref  | vote_id,vote_user_id,vote_n_user | vote_n_user |       6 | const,const |    1 | Using where; Using index |
+----------------+------+----------------------------------+-------------+---------+-------------+------+--------------------------+
1 row in set (0.01 sec)

+---------+
| vote_id |
+---------+
|  586879 |
+---------+
1 row in set (0.00 sec)
Hurray.

Daijoubu
Registered User
Posts: 64
Joined: Wed Aug 21, 2002 4:31 pm
Location: Montreal QC Canada
Contact:

Post by Daijoubu » Wed Jun 30, 2004 12:36 pm

Someone should really write a SQL debug mode that IPB have :lol:
Ex: http://ptprophecy.com/forum/?debug=1

Much easier to find the culprit, slow queries log is still useful though ;)

Edit: I see, seems like only the Oracle drive have this 8O
Image

Locked

Return to “2.0.x Discussion”