[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
InFamous
Registered User
Posts: 2
Joined: Wed Nov 26, 2003 10:00 pm

....

Post by InFamous » Thu Nov 27, 2003 1:01 am

Well He told me if I was interesting in being a mod in go-gaia.com to come here.. Sorry If this isnt the right spot

Zemog
Registered User
Posts: 22
Joined: Thu Apr 18, 2002 11:47 am
Location: Mexicali, B.C. México

Thank you and some additions

Post by Zemog » Fri Nov 28, 2003 2:53 pm

Lanzer: Thank you very much for your tips.

Here are my two cents as an addition to your code for the functions_search.php

You said something about ignoring words with more than three repeated chars on them, but couldn't find that part on the code you posted, :? so I gave it a try with the regex and I've came up with the following:

On the first piece of code that Lanzer posted find:

Code: Select all

$match[] = '#\b([a-z0-9]{1,' . $config['min_search_chars'] . '}|[a-z0-9]{' . $config['max_search_chars'] . ',})\b#is';
To ignore words with three or more repeated consecutive letters (e.g. arrrgh) add:

Code: Select all

  // Ignore words with three or more repeated consecutive letters
      $match[] = '#\b([a-z])*\1{2,}[a-z]*\b#is'; 
I've also added the following to ignore all numbers "words" (e.g. 123 -123 123.4 123,456):

Code: Select all

  // Ignore all numbers "words"
      $match[] = '#\b([0-9\-\.\,]+)\b#is';
Please note that I know next to nothing about regular expressions :roll: , so there's a big chance that the expressions I posted could be improved.
But they did worked on my tests as they are, and my forum is very small, so I believe they are more than enough for me.

Any comments/improvements are welcomed :wink:

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Fri Nov 28, 2003 7:03 pm

InFamous> Please post at Gaia Online or PM me, do not listen to rumours.

Zemog > Thanks for the input. I remember running a test code to catch repeated characters but I don't know what happened to it. Shows you how organized I am. :) The number hack is cool too, something I never thought about.

Anyways, here's another tweak. A couple of them actually.

As the board reaches over 3500 users during the day, I'm once again starting to see degregation caused by queries that uses swap space or temp tables on the database server. Turns out those queries were all residing in the search.php script, so I've made a few adjustments in it to avoid those queries from being bottlenecks. Ideally, if I can start replicating a second database and have all the search related functions running there it would be great. But until the hardware is in place a lot of corners have to be cut to make ends meet.

Here is a list of what's been done:

- Removed wildcard support for author name search. Entering one astrisk as search parameter will yield all members on the forum (meaning it'll start a search for all messages in the forum) and that should not be allowed.
- All searches has "ORDER BY post_id DESC LIMIT 400" until I can move the search funtion to another database server

Here are some changes that are more useful:

There are two queries that uses the GROUP BY statement:

Code: Select all

						$sql = "SELECT topic_id 
							FROM " . POSTS_TABLE . "
							WHERE post_id IN (" . implode(", ", $search_id_chunks[$i]) . ") 
							$where_sql 
							GROUP BY topic_id";
Removing the GROUP BY statement if you're running MySQL avoids the creation of temporary space. Queries went from 30 seconds to 1. :)

Then starting from:

Code: Select all

//
// Look up data
//
		if ( $show_results == 'posts' )
		{
			$sql = "SELECT p.post_id 
				FROM " . FORUMS_TABLE . " f, " . TOPICS_TABLE . " t, " . USERS_TABLE . " u, " . POSTS_TABLE . " p WHERE p.post_id IN ($search_results)
					AND f.forum_id = p.forum_id
					AND p.topic_id = t.topic_id
					AND p.poster_id = u.user_id";
		}
		else
		{
			$sql = "SELECT t.topic_id 
				FROM " . TOPICS_TABLE . " t, " . FORUMS_TABLE . " f, " . USERS_TABLE . " u, " . POSTS_TABLE . " p, " . POSTS_TABLE . " p2, " . USERS_TABLE . " u2
				WHERE t.topic_id IN ($search_results) 
					AND t.topic_poster = u.user_id
					AND f.forum_id = t.forum_id 
					AND p.post_id = t.topic_first_post_id
					AND p2.post_id = t.topic_last_post_id
					AND u2.user_id = p2.poster_id";
		}

The whole portion above plus the portion with the case statements had been changed to the following to allow for more optimized queries and to avoid creation of any temp tables and avoid un-necessary table joins:

Code: Select all

		switch ( $sort_by )
		{
			case 1:
			case 2:
                if ( $show_results == 'posts' )
                {
                    $sql = "SELECT p.post_id FROM " . TOPICS_TABLE . " t, " . POSTS_TABLE . " p WHERE p.post_id IN ($search_results) AND p.topic_id = t.topic_id ";
                }
                else
                {
                    $sql = "SELECT t.topic_id FROM " . TOPICS_TABLE . " t WHERE t.topic_id IN ($search_results) ";
                }
                $sql .= " ORDER BY ";
				$sql .= 't.topic_title';
				break;
			case 3:
                if ( $show_results == 'posts' )
                {
                    $sql = "SELECT p.post_id FROM " . USERS_TABLE . " u, " . POSTS_TABLE . " p WHERE p.post_id IN ($search_results) AND p.poster_id = u.user_id";
                }
                else
                {
                    $sql = "SELECT t.topic_id FROM " . TOPICS_TABLE . " t, " . USERS_TABLE . " u WHERE t.topic_id IN ($search_results) AND t.topic_poster = u.user_id ";
                }
                $sql .= " ORDER BY ";
				$sql .= 'u.username';
				break;
			case 4:
                if ( $show_results == 'posts' )
                {
                    $sql = "SELECT p.post_id FROM " . POSTS_TABLE . " p WHERE p.post_id IN ($search_results) ORDER BY p.forum_id ";
                }
                else
                {
                    $sql = "SELECT t.topic_id FROM " . TOPICS_TABLE . " t WHERE t.topic_id IN ($search_results)  ORDER BY t.forum_id ";
                }
				break;
			default:
                if ( $show_results == 'posts' )
                {
                    $sql = "SELECT post_id FROM " . POSTS_TABLE . " WHERE post_id IN ($search_results) ORDER BY post_id ";
                }
                else
                {
                    $sql = "SELECT topic_id FROM " . TOPICS_TABLE . " WHERE topic_id IN ($search_results) ORDER BY topic_last_post_id ";
                }
				break;
		}
I'm using post_id instead of post_time to sort posts in chronological order. Hope it's okay on other database systems.

Lastly in my quest to remove every single query in phpBB that causes creation of tmperory space, the last query was modified to not sort the entries. Rather, they're sorted by PHP instead. It's rather silly to worry about sorting on a page full of data but it's just nice to know that no single query in phpBB makes tmp space now. :)

Basically, this portion:

Code: Select all

		switch ( $sort_by )
		{
			case 1:
                $sql .= " ORDER BY ";
				$sql .= ( $show_results == 'posts' ) ? 'pt.post_subject' : 't.topic_title';
				break;
			case 2:
                $sql .= " ORDER BY ";
				$sql .= 't.topic_title';
				break;
			case 3:
                $sql .= " ORDER BY ";
				$sql .= 'u.username';
				break;
			case 4:
                $sql .= " ORDER BY ";
				$sql .= 'f.forum_id';
				break;
			default:
                $sql .= " ";
                $sql .= " ORDER BY ";
				$sql .= ( $show_results == 'posts' ) ? 'p.post_time' : 'p2.post_time';
				break;
		}
		$sql .= " $sort_dir ";
        if ( !$result = $db->sql_query($sql) )
		{
			message_die(GENERAL_ERROR, 'Could not obtain search results', '', __LINE__, __FILE__, $sql);
		}
		$searchset = array();
		while( $row = $db->sql_fetchrow($result) )
		{
			$searchset[] = $row;
		}
		$db->sql_freeresult($result);
Which slips in the ORDER BY statements had been removed and replaced with:

Code: Select all

        if ( !$result = $db->sql_query($sql) )
		{
			message_die(GENERAL_ERROR, 'Could not obtain search results', '', __LINE__, __FILE__, $sql);
		}
		switch ( $sort_by )
		{
			case 1:
				$sort_category = ( $show_results == 'posts' ) ? 'post_subject' : 'topic_title';
				break;
			case 2:
                $sort_category = 'topic_title';
				break;
			case 3:
                $sort_category = 'username';
				break;
			case 4:
                $sort_category = 'forum_id';
				break;
			default:
                $sort_category = 'post_time';
				break;
		}


        $sort_key = array();
		$rawset = array();
		while( $row = $db->sql_fetchrow($result) )
		{
            $sort_key[] = $row[$sort_category];
			$rawset[] = $row;
		}
		
		$db->sql_freeresult($result);		
		
        if ( $sort_dir == 'ASC' ) 
        {
            asort($sort_key);
        } 
        else 
        {
            arsort($sort_key);
        }

		$searchset = array();
		while( list($k,$v) = each($sort_key) )
		{
			$searchset[] = $rawset[$k];
		}

        unset($rawset);
Life is good. :)

User avatar
bboombotz
Registered User
Posts: 9
Joined: Fri Mar 21, 2003 2:43 pm
Location: Florida, USA
Contact:

Post by bboombotz » Mon Dec 01, 2003 2:21 pm

I had a question since you all have talked about search functionality. I am curious if it were possible to use SQL driven statements to do the logic in searching. Now mind you I am no DB expert nor am I a PHP expert, though I do program in PHP for my job.

Anyways, what I have noticed is due to how phpbb uses the wordlist and wordmatch for searches, even posting new topics can be slow if those tables are quite large, as the code has to search through them and add new words, etc.

Instead of having these word matching tables, why not perform a SQL query like the following:

Code: Select all

select * from phpbb_posts_text where post_text like '%researchers%';
You can get even fancier with the SQL query, linking the post_id from posts_text to posts, replacing that column with the data from posts (hope that makes some sense).

Anyways I think it might be possible to use SQL driven queries to perform searching, rather than using those word tables. Also I am just curious if this has been attempted and is slower or not. Again I am no DB expert, but the guy at my job is and he has helped me in the past with some pretty slick queries.

Thanks!

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Silly performance hacks

Post by lanzer » Mon Dec 01, 2003 2:29 pm

One thing that I did a while back was a silly threadview update hack. Basically, everytime someone views a thread, a counter goes up by one. To avoid so many database updates on such a trivial data, I made a quick mod on viewtopic.php:

Code: Select all

        $sql = "UPDATE " . TOPICS_TABLE . " SET topic_views = topic_views + 1 WHERE topic_id = $topic_id";
        if ( !$db->sql_query($sql) )
        {
            message_die(GENERAL_ERROR, "Could not update topic views.", '', __LINE__, __FILE__, $sql);
        }
to:

Code: Select all

    if (mt_rand(1,3) == 1) {
        $sql = "UPDATE " . TOPICS_TABLE . " SET topic_views = topic_views + 3 WHERE topic_id = $topic_id";
        if ( !$db->sql_query($sql) )
        {
            message_die(GENERAL_ERROR, "Could not update topic views.", '', __LINE__, __FILE__, $sql);
        }
    }
Yes, that means the view count will only update by 3 views 1/3 of the time. (told you it's silly!) It's not a desirable thing to do unless most threads are extremely active, but even setting the ratio to 50% do mean you'll cut off 1/2 the writes to the topics table. It's a fun little hack especially if you're running MyISAM tables. And I'm sure nobody would complain about the board being off by one thread view every now and then. :)

A more solid solution is to have a memory buffer such as a heap table storing the thread views befor dumping them, though I'm not really into the idea of using more memory on the database server. Instead, I'll be installing a memory cache server (memcache) next month to buffer all these constant updates.

Second problem I came across is the updating of the forum table. As there are 250,000+ posts a day, the forum table gets updated every time a post is made. That's not a big deal, the problem is that since the forum table is updated constantly, all the forum permissions cannot be cached by MySQL through the MYSQL_CACHE feature, as contents within a cache will drop every time the table is updated. As forum permission is something that's queried on every single pageview. Having it cached will give terrific performance increase. Also, with less table updates I can move the forum table back to MyISAM which will be much faster on reads than InnoDB.

The first step is to disable the updating of forum_last_post_id. The code is in functions_post.php

disable:

Code: Select all

        $sql = "UPDATE " . FORUMS_TABLE . " SET $forum_update_sql WHERE forum_id = $forum_id";
        if (!$db->sql_query($sql))
        {
            message_die(GENERAL_ERROR, 'Error in posting', '', __LINE__, __FILE__, $sql);
        }
Next, slip in this function in a cron job:

Code: Select all

$sql = "SELECT forum_id, forum_last_post_id from " . FORUMS_TABLE;
$result = $db->sql_query($sql);
while (list($forum_id,$forum_last_post_id) = $db->sql_fetchrow($result) ) {
    $sql2 = "SELECT topic_last_post_id AS last_post FROM ".TOPICS_TABLE." WHERE forum_id = $forum_id ORDER BY topic_last_post_id DESC LIMIT 1";
    $result2 = $db->sql_query($sql2);
    $last_post = $db->sql_fetchfield("last_post", 0, $result2);
    if ( $last_post > $forum_last_post_id ) {
        $sql3 = "UPDATE " . FORUMS_TABLE . " SET forum_last_post_id = $last_post WHERE forum_id = $forum_id";
        $db->sql_query($sql3);
    }
}
I now update the forum table every minute. Works great. (note the lack of error detection, I need to tidy up my code later :)

Was tempted to make a single query to grab all the latest post_id, but anything involving the GROUP BY statment takes forever to run.
Last edited by lanzer on Tue Dec 02, 2003 12:37 am, edited 1 time in total.

aharris
Registered User
Posts: 9
Joined: Fri Sep 19, 2003 10:18 am
Location: London, UK
Contact:

Post by aharris » Mon Dec 01, 2003 3:50 pm

bboombotz wrote: I had a question since you all have talked about search functionality. I am curious if it were possible to use SQL driven statements to do the logic in searching. Now mind you I am no DB expert nor am I a PHP expert, though I do program in PHP for my job.

Anyways, what I have noticed is due to how phpbb uses the wordlist and wordmatch for searches, even posting new topics can be slow if those tables are quite large, as the code has to search through them and add new words, etc.

Instead of having these word matching tables, why not perform a SQL query like the following:

Code: Select all

select * from phpbb_posts_text where post_text like '%researchers%';
You can get even fancier with the SQL query, linking the post_id from posts_text to posts, replacing that column with the data from posts (hope that makes some sense).

Anyways I think it might be possible to use SQL driven queries to perform searching, rather than using those word tables. Also I am just curious if this has been attempted and is slower or not. Again I am no DB expert, but the guy at my job is and he has helped me in the past with some pretty slick queries.

Thanks!


Hi - I'm a myphp/php/sql n00bie who has hacked and sweated his way to a search functionalilty which makes sense on my forum (similar to Google, most importantly supporting full phrase searching). So, I have looked at a number of the issues you have looked at/will be looking at. Here are a few thoughts:

1) There is only one fundamental and inescapable disadvantage to using phpbb's word matching tables, and that is the size they get to, if you are size sensitive. I am now convinced that 'match words' is by far the quickest methodology and if your forum is of any size will leave any phpbb_posts_text type of modified search standing in the dust - I considered your proposed method but it became clear my searches would slow to the speed of cooling lava. What I came up with is a bizarre but seemingly quite fast and efficient overlay to the built-in phpbb search, which is to get back the results from the normal sql query, do a full phrase type search on them using php code, get a new post_id results set and then allow the phpbb function to continue as normal with the new, slimmed down results set. You might think the double search is inefficient, but in fact it only does the pre-search if it finds a phrase-based type of search requested and these types of searches tend to reduce the number of results substantially so the second search returns much fewer results. My hack is in this topic at Fri Nov 21, 2003 7:00 pm Post subject: sooper-stoopid question about searching medium size foru.. I can't really spend much time supporting it I'm afraid, but if you get really stuck by all means PM me..


2) All the other disadvantages of using phpbb's word matching tables can normally be tweaked in the php code without too much trouble - for example, I have disabled the stopwords file (I just have one entry in it which is 'zxzxzxzx' (ie nothing)). I have also disabled the common words marking in phpbb_search_wordlists - I just got rid of the line in search.php which looks at it. I have seen various others in this topic, much better versed than I, suggest a variety of other cleaver tweaks which can have substantial benefits in terms of time and speed. For example, there seem to be a whole number of queries performed on many pages which you may well not need to execute.

3) If you are intent on going down the phpbb_posts_text type route, AND IF YOU ARE USING MYSQL, then you want to look at the fulltext index functionality in mysql. Basically, mysql makes a fulltext index on your phpbb_posts_text and post_subject fields and you search on the index rather than on the field itself. This is apparantly a fair bit quicker than just searching on the fields - 2 problems however; 1) mysql 3 doesn't support boolean AND OR NOT etc searching on a FULLTEXT index, 2) I think the minimum number of characters in a search word in mysql 3 is only 4 characters which may be a problem for many. In mysql 3, I don't think you can easily change this, whereas I think you can in mysql 4, and also I think mysql 4 may allow a minimum of 3 characters.

The reason the folks who wrote phpbb don't use this functionality amongst other reasons is because fulltext indices are not available in other database environments and phpbb is designed to be db-independent.

Hope the above helps, good luck :D

Atari
Registered User
Posts: 20
Joined: Mon Dec 01, 2003 10:31 pm

Re: Silly performance hacks

Post by Atari » Tue Dec 02, 2003 12:31 am

lanzer wrote:


Hey Lanzer,

GREAT STUFF... we're going to be starting a pretty huge forum as well (subject matter unrealted to yours) and I'm sure your hacks will be a big time saver.

I had a question though.. would you mind sharing your system / network specs? What kind of box its on (CPU/s, mem, disk types/size etc)

Thanks!

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Tue Dec 02, 2003 2:50 am

Hello bboombotz:

Well I'm no PHP/MySQL expert neither, just somone thrown in the lion's den with a lot of hungry lions. :) Right now our POST_TEXT_TABLE size is 4GB. Imagine if everytime someone request a search, that the server needs to seek through 4GB worth of data from the HD, that would be too much. Even if all the information is on RAM, shuffling through 10 million entries takes a few good minutes. Imagine if the operation is based on reading from disks, then multiply that by how many users might be searching at one time (3800 people are online right now). That's just too much. Basically, any tasks that takes more than 0.1 second to complete is bad news.

The sweet part about having a word search table is that for starters, all the information that's needed is in memory, second, the searchable info is more relavent (no need to search pass bad or useless words), thirdly, information in the word table is sorted as they're indexed, and forth, after a comprehensive word search table exist, even with more messages added to the forum the word search table doesn't get much bigger, as the time for searching grows in linear if you're doing a full text search. These are just some key points on the pros of a word search table, and we havne't looked into the inefficiency of full text searches yet. :?

At times, I'm curious if there is a median that we can choose. For example, have a fixed word match table consisting of words in the english dictionary, that way the table won't keep expanding and take up more time/RAM to operate. . . I'm sure there are a lot of more innovative tweaks out there to make an index search more efficient, but most likely people or companies who devote the time to developing such techniques aren't working on an open source software. :)

Until then... our wordlist table is at 150k in entries. Ouch. Time to do some house cleaning. :) Not to mention the size of the wordmatch table which is at 8.6 million entries. Things are gonna be tough all over...

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Re: Silly performance hacks

Post by lanzer » Tue Dec 02, 2003 3:18 am

Hi Atari, here's a list of our current setup:

4 Forum front end, 1 graphics front end, 1 misc. service frontend, 1 avatar server, and one database backend.

Of the 4 forums:
- 3 are P4 2.6Ghz machines with 2G of RAM, one IDE HD (taking 50 page/second, setup to take ~250 simultaneous connections)
- 1 dual Xeon 2.6G with 4G RAM, 3 IDE HD (taking 80 page/second, taking ~500 connections)

Graphics frontend and avatar server are P4 2.8 with 2G RAM, with a stripped down Apache server that can't handle any dynamic content but can take 350 page/second and can take ~1500 connections.

Database backend is a dual Opteron 240 with 4GB RAM, one IDE HD, 2 pairs of 15,000RPM drives running RAID-0 with the help of a RAID card with 64MB cache.

Right now the database server is running happily as we have 3800 people on-line. Though our stats doesn't reflect a regular forum as we include services such as a link list and an art voting page, but at the same time, we also handle a ton of additional queries which adds virtual gold/credit to users as they surf our site. A regular forum with 3800 users on-line would definitely have less queries and updates than ours. That's something to keep in mind. :o

Network setup is nothing to write home about. Each machine has two network cards. One private network is for chatting with the database server, and a memory cache server soon to be installed. The other is for internet traffic.

Hope that helps, good luck with your site!
Atari wrote: Hey Lanzer,

GREAT STUFF... we're going to be starting a pretty huge forum as well (subject matter unrealted to yours) and I'm sure your hacks will be a big time saver.

I had a question though.. would you mind sharing your system / network specs? What kind of box its on (CPU/s, mem, disk types/size etc)

Thanks!

Atari
Registered User
Posts: 20
Joined: Mon Dec 01, 2003 10:31 pm

Re: Silly performance hacks

Post by Atari » Tue Dec 02, 2003 9:38 pm

lanzer wrote: Hi Atari, here's a list of our current setup:

Of the 4 forums:
- 3 are P4 2.6Ghz machines with 2G of RAM, one IDE HD (taking 50 page/second, setup to take ~250 simultaneous connections)
- 1 dual Xeon 2.6G with 4G RAM, 3 IDE HD (taking 80 page/second, taking ~500 connections)

Database backend is a dual Opteron 240 with 4GB RAM, one IDE HD, 2 pairs of 15,000RPM drives running RAID-0 with the help of a RAID card with 64MB cache.


Hey Lanzer, thanks for the reply! I found your (earlier) hardware setup in the interview Big-Boards did on you. But this info does help.

I was checking out Opteron (since you use it on your big-board)... and it looks great. I guess MySQL made some tweaks to take advantage of it on Linux (But only linux??)

Here's a few questions... probably a few too many :)
I'm just in awe of your system and I need to be able to push similar numbers (for a board completely unrelated to anything you're doing).

I understand completely if you don't want to share all the intimate details of your setup. But it doesn't hurt to ask.. and you seem very helpful and willing to share your knowledge on all this (which is much appreciated by many many phpbb users I'm sure :))


Anway could you share some info on that Opteron box?

Are you using aurel42's replication patch now?
It looked like during the big-boards interview you were only using a single front end... or maybe you only had dedicated mysql server for both read/write back then?


What operating system is it running?
I assume its linux, but I'm a BSD guy.. I'd like to try FreeBSD if I wouldn't take a big performance hit for doing so :) Not sure if you know anything about that...

What motherboard did you go with?
I was looking at the TYAN S2885 (supports up to 8gigs memory)

What raid card are you're using?
(I noticed you said Adaptec + Opeteron is no fun)

How big are the drives in the datbase server & how did you stripe them and partition them for your operating system & forum use?
You have 10mil+ posts in the forum alone... how full is that raid?

--------

About the application / front end servers:

Basically the same questions as above...

----------

Are there any weaknesses, hardware wise, in your current system that you have noticed that you would do differently if you could do it over again? Or unused performance that you paid for didn't see the expected results?



Sorry for so many questions here :(

User avatar
bboombotz
Registered User
Posts: 9
Joined: Fri Mar 21, 2003 2:43 pm
Location: Florida, USA
Contact:

Post by bboombotz » Tue Dec 02, 2003 10:33 pm

Just wanted to say thanks to aharris and Lanzer for your comments. You two definitely are more versed in SQL than I... now that I think about my statement I can see how ridiculously slow it would be.

If I ever come across anything that might aid the phpbb group in anyway, I will make sure to post it.

Thanks again!

d-ArkAngel
Registered User
Posts: 64
Joined: Sun Jun 02, 2002 11:30 am
Location: England, Redcar
Contact:

Post by d-ArkAngel » Tue Dec 02, 2003 10:33 pm

Hi lanzer

could you post a list of what indexes you have on your board?

just because I've tried makeing a couple of additions to reduce the number of file scans but it seems that many of the querys don't want to use my new indexes, did you replace your old indexes?

I'm in much need of assistance with this since my site is sailing very close to the limit of my avalaible hardware. (at this moment, it's looking more and more like I'm going to need hardware upgrades all the time.)
Robert Laverick (dArkAngel)
Your just jealous that the voices talk to me!
Living the thin line between Inspiration and Insomnia
Image

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Speeding up memberlist.

Post by lanzer » Wed Dec 03, 2003 1:51 am

Just made another silly modification earlier:

Saw a few queries taking a few minutes to complete, turns out the memberlist is slowing the site down today. By default it sorts by registration date, and the SQL statements also sorts by registration date. Though reg date is not indexed, so instead if you change all occurances inside memberlist.php from $order_by = "user_regdate to $order_by = "user_id, the forum's happy again since user_id is indexed. The option to sort by web site and e-mail address are also taken away since they're not indexed, and I can't really see why anyone would want to sort by those two parameters...

Just cutting corners as the site grows. :)

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Can't get enough of query enhancements

Post by lanzer » Wed Dec 03, 2003 2:19 am

In the corner of my eye I spotted a few queries that were taking a few seconds to load, and so a few changes were made:

file: functions_post.php

Code: Select all

$sql = "SELECT MAX(post_id) AS last_post_id FROM " . POSTS_TABLE . " WHERE topic_id = $topic_id";
had been changed to

Code: Select all

$sql = "SELECT post_id AS last_post_id FROM " . POSTS_TABLE . "  use INDEX(topic_n_id) WHERE topic_id = $topic_id ORDER BY post_id LIMIT 1";
* you get even better result if the topic_n_id index is used. It's the index mentioned on page one of this thread that speed up message search on large threads.

result:

Code: Select all

mysql> SELECT MAX(post_id) AS last_post_id FROM bb_posts where topic_id = 323000;
+--------------+
| last_post_id |
+--------------+
|     12229812 |
+--------------+
1 row in set (0.12 sec)

mysql> SELECT post_id AS last_post_id FROM bb_posts where topic_id = 323000 ORDER BY post_id DESC LIMIT 1;
+--------------+
| last_post_id |
+--------------+
|     12229812 |
+--------------+
1 row in set (0.01 sec)
Hey don't laugh, it is 12 times faster you know. :lol:

d-ArkAngel
Registered User
Posts: 64
Joined: Sun Jun 02, 2002 11:30 am
Location: England, Redcar
Contact:

Post by d-ArkAngel » Wed Dec 03, 2003 9:23 pm

I had to add a use index statment very much like that in view topics to convince mysql that there was a way of doing things that didn't need file scans, which was why I asked you about which indexs you had since I was wondering if you had a different setup that allowed your to select the proper index.

I'm going to write some debugging code to do a more complete analysis of phpBB's qry usage. I think that there's a few places that it's not using indexes because I'm still seeing some filescans going on when I look at the mysqld stats.
How have you been debugging your qry usage? just testing them one by one? or is there a way to turn some kind of logging on that is exaustive for a short "burst" of information?

I've played with a little app called mytop, but it dosn't update fast enough for me to see all the qrys that happen (and I miss a great many :-) )

since the index it is using could be "altered" to use to have the dual coulum index would that cause slowdown where one of the other indexes would be faster? or would it be better to keep the number of indexes low to keep insert and update faster (since the difference between the two is probably minisqule on each qry how do we measure such things? :-) )

I don't seem to be able to find a good explanation of why mysql choses the index it does, which would be helpful in these circumstances (or is it faster for us to do it's work by telling it explicitaly which index to use I wonder? another increadably minor pref tweak? :-) )

wow I just can't seem to submit this post I just keep getting dragged away from the computer by my pointless mind wandering :-) sorry it this is at all disjointed :-)
Robert Laverick (dArkAngel)
Your just jealous that the voices talk to me!
Living the thin line between Inspiration and Insomnia
Image

Locked

Return to “2.0.x Discussion”