[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [2.0.x] Tweaks for large forums

Post by arod-1 » Fri Aug 29, 2008 12:17 am

just couple of small comments:
-- you mentioned "no fulltext search" as a limitation of mysql 5.1 split table feature. i sure hope your implementation does not try to still have the mysql fulltext feature enabled...
-- if one goes as far as to dissect the code to support partitioning, one might want to consider partitioning to several databases rather than just several tables in the same database - much more simple to do load-sharing among several server without the hassle of clusters.
-- you might want to give a second look to the outline i posted several posts ago about dividing the content between "live" and "archive" rather than symmetrical splitting of the posts table.

good luck with the project!
standard disclaimer:
backup your db and files before you do anything.
absolutely no guarantee.
if you do what i advise and it blows in your face, all you'll hear from me is: "ah... sorry, i guess"

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow » Fri Aug 29, 2008 12:22 am

arod-1 wrote:just couple of small comments:
-- you mentioned "no fulltext search" as a limitation of mysql 5.1 split table feature. i sure hope your implementation does not try to still have the mysql fulltext feature enabled...
-- if one goes as far as to dissect the code to support partitioning, one might want to consider partitioning to several databases rather than just several tables in the same database - much more simple to do load-sharing among several server without the hassle of clusters.
-- you might want to give a second look to the outline i posted several posts ago about dividing the content between "live" and "archive" rather than symmetrical splitting of the posts table.

good luck with the project!
1.) Umm.... actually, it does. Via the UNION clause. :oops: I just thought I'd throw it in there.

2.) Yep, that's covered in the first or second paragraph. A good idea.

3.) Yeah, but that's losing even more functionality, I think. Yes, you could argue that topics whose last post dates from 2004 will never again be discussed, but on the other hand, if you could have all your posts live, I think that's the best solution. (Though maybe not always the most practical!)
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [2.0.x] Tweaks for large forums

Post by arod-1 » Fri Aug 29, 2008 1:43 am

(regarding mysql fulltext search:)
Dog Cow wrote:1.) Umm.... actually, it does. Via the UNION clause. :oops: I just thought I'd throw it in there.
my point was/is that for a board large enough to warrant splitting the post table, mysql fulltext search is inadequate.

i did not check recently, but according to the information i gathered when writing the mysql fulltext search mod for phpbb2, mysql fulltext search does not gracefully scale to multi-gigabyte tables.
they may have improved it since, or maybe my data was not high-quality, but this is what i remember.
standard disclaimer:
backup your db and files before you do anything.
absolutely no guarantee.
if you do what i advise and it blows in your face, all you'll hear from me is: "ah... sorry, i guess"

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow » Fri Aug 29, 2008 8:23 pm

Yep, I've heard those same things. Fortunately, I'm using the very latest version of Mysql 5.0.67 with a modern Mac (1GB Ram,Intel Core 2 Duo 1,83GHz) , so I can test a multi-gigabyte database fulltext index with the latest gear. Now of course, this testing is far from comprehensive, but I can still have a look.

Really, though, I think that a fulltext index, if you've researched it and believe it can work for your needs, should be at least considered or tried. Things have improved over the years. Otherwise, there are many other possibilities out there, such as having Google handle the search (which is what the 1.4 billion post forum does), or using the Sphinx system.
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

arod-1
Registered User
Posts: 1327
Joined: Mon Sep 20, 2004 1:33 pm

Re: [2.0.x] Tweaks for large forums

Post by arod-1 » Mon Sep 01, 2008 2:21 pm

Dog Cow wrote:Yep, I've heard those same things. Fortunately, I'm using the very latest version of Mysql 5.0.67 with a modern Mac (1GB Ram,Intel Core 2 Duo 1,83GHz) , so I can test a multi-gigabyte database fulltext index with the latest gear. Now of course, this testing is far from comprehensive, but I can still have a look.

Really, though, I think that a fulltext index, if you've researched it and believe it can work for your needs, should be at least considered or tried. Things have improved over the years. Otherwise, there are many other possibilities out there, such as having Google handle the search (which is what the 1.4 billion post forum does), or using the Sphinx system.
i agree 100%.
i was actually advocating the google solution on several occasions. the only drawback i can think of is the slight time-lag, because you can't guarantee that the google index is up-to-the-latest-post updated.
as to mysql fuultext performance: after i published my mod, i asked for feedback/measurements from anyone who uses it, and received none, so i had to assume the negative things i heard about it are actually correct, at least until i hear otherwise.
i kind of assumed that even if it's not that great, it will improve as time goes by, but did not revisit the issue.
standard disclaimer:
backup your db and files before you do anything.
absolutely no guarantee.
if you do what i advise and it blows in your face, all you'll hear from me is: "ah... sorry, i guess"

User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: [2.0.x] Tweaks for large forums

Post by drathbun » Mon Sep 01, 2008 3:36 pm

I much prefer a locally hosted search over google. For one thing, you can't rely on google to be present, nor can you rely on them to continue indexing your board at some specific level. You can't use google to search hidden / secured / private forums. You can't use google for "search this topic" or "search this forum" types of functionality.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image

Amodinos
Registered User
Posts: 20
Joined: Sun Jan 20, 2008 8:22 am

Re: [2.0.x] Tweaks for large forums

Post by Amodinos » Tue Sep 02, 2008 12:55 am

I feel once a Sphinx engine is developed with full functionality, a lot of our problems will be solved for the time being. :) It's probably one of the biggest problem solvers that could be developed as of yet in terms of server load for large databases.

User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl » Thu Oct 02, 2008 2:42 am

After using sphinx around ~1 month now, we notice 11% more possible online users on our machine.

Last query in slow-queries.log, which appear every couple minutes. I was testing it and still not developed any optimization so i put it here:
# Query_time: 5 Lock_time: 0 Rows_sent: 16 Rows_examined: 442
SELECT pt.post_text, pt.bbcode_uid, pt.post_subject, p.*, f.forum_id, f.forum_name, t.*, u.username, u.user_id, u.user_rank, u.user_posts, u.user_sig, u.user_sig_bbcode_uid
FROM phpbb_forums f, phpbb_topics t, phpbb_users u, phpbb_posts p, phpbb_posts_text pt
WHERE p.post_id IN (41, 154, 374, 417, 442, 564, 696, 735, 774, 777, 820, 892, 899, 960, 1030, 1033, 1042, 1111, 1122, 1294, 1298, 1300, 1406, 2121, 2223, 2293, 2422, 2538, 3049, 3100, 3724, 3742, 4032, 4122, 4130, 4143, 4152, 4221, 4271, 4286, 4298, 4754, 4774, 4778, 4917, 4942, 5054, 5106, 5423, 5455, 5465, 5843, 5850, 6243, 6617, 6618, 6627, 6679, 6770, 6772, 7188, 7212, 7332, 7455, 7485, 7890, 7993, 7994, 7995, 7996, 7997, 7998, 7999, 8000, 8001, 8002, 8003, 8005, 8006, 8007, 8008, 8009, 8087, 8506, 8508, 8532, 8533, 8535, 8675, 8830, 8939, 8982, 9021, 9057, 9078, 9861, 10632, 10686, 10687, 10802, 11569, 11913, 11914, 12402, 13249, 13251, 14195, 14454, 14800, 14835, 14841, 15670, 17586, 17760, 19061, 19416, 20388, 20403, 20437, 20449, 20464, 20854, 21149, 21472, 21791, 22049, 22085, 22202, 22935, 23055, 23529, 23688, 24460, 24820, 25466, 25495, 25769, 25779, 26087, 26096, 26168, 26346, 26356, 27144, 27146, 27874, 27959, 28023, 28027, 28034, 28289, 28319, 28354, 28364, 28737, 29110, 29562, 29563, 29657, 30195, 30253, 31143, 31839, 32048, 32049, 32127, 32556, 32617, 32799, 32983, 33118, 33414, 33878, 34371, 34547, 34954, 35436, 35536, 35544, 35798, 36021, 36217, 37566, 38531, 38540, 39450, 39571, 39765, 39919, 40000, 40826, 40867, 40874, 41208, 41227, 41303, 41482, 41543, 42687, 43130, 43801, 44447, 44542, 44670, 44906, 46043, 46097, 46190, 46618, 46943, 47734, 48180, 49416, 49967, 50429, 52387, 54089, 54891, 55142, 55564, 56868, 58896, 60154, 61597, 61607, 61700, 63546, 63746, 63874, 64141, 65051, 65482, 65488, 67911, 68298, 68449, 68842, 71930, 72171, 73584, 74949, 75882, 76681, 76864, 77852, 79695, 81502, 82347, 82721, 86148, 86183, 86185, 86186, 86596, 86787, 88858, 89032, 89275, 90306, 91375, 91775, 92730, 94397, 94400, 95025, 95448, 99009, 99650, 102365, 103471, 103481, 105666, 111963, 114398, 117132, 118647, 118908, 123252, 124576, 128575, 128742, 132167, 133758, 135378, 138575, 142691, 143079, 146003, 146686, 153755, 158019, 158716, 168437, 168836, 170090, 170197, 170286, 172804, 175058, 175856, 176726, 179889, 184470, 185471, 186408, 186751, 188225, 191523, 192947, 197733, 198399, 201998, 203406, 207369, 208169, 210672, 211281, 211986, 213622, 214781, 216952, 217583, 220946, 224052, 225761, 229402, 231767, 234644, 236315, 236927, 243594, 246955, 248197, 250894, 255443, 260560, 260628, 276253, 280963, 297169, 301025, 305387, 307178, 307591, 307754, 308621, 311672, 314019, 315486, 323900, 332030, 337153, 341329, 348472, 348580, 361364, 363253, 365113, 365221, 371786, 377231, 394973, 412128, 413312, 417216, 418451, 420022, 429796, 431207, 438367, 448410, 456448, 458673, 458913, 462470, 467066, 474513, 475246, 476572, 487019, 488709, 489419, 490296, 490401, 494080, 495576, 499693, 506102, 512708, 521189, 522767, 528464, 533892, 536524, 545307, 549054, 549825, 549827, 553347, 558939, 566995, 575815, 578688, 584365, 586356, 586522, 589949, 590585, 604949, 609632, 609656, 609676, 611747, 612196, 612346, 612612, 615195, 618487, 622400, 639690, 657980, 670523, 675966, 692273, 706685, 707210, 714968, 722688, 749071, 760788, 763793, 771669, 775132, 775485, 781390, 789016, 790900, 801102, 830722, 858350, 860247, 862051)
AND pt.post_id = p.post_id
AND f.forum_id = p.forum_id
AND p.topic_id = t.topic_id
AND p.poster_id = u.user_id ORDER BY p.post_time DESC LIMIT 0, 16;
What we should notice in below query explain. We scanned every row from 'post_id IN' part. There are exactly 442 post_id's in brackets above, same value scanned which is
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t range PRIMARY,forum_id,topic_last_post_id,fid_type_lp PRIMARY 3 NULL 442 Using where; Using temporary; Using filesort
1 SIMPLE f eq_ref PRIMARY PRIMARY 2 t.forum_id 1
1 SIMPLE u eq_ref PRIMARY PRIMARY 3 t.topic_poster 1
1 SIMPLE p eq_ref PRIMARY PRIMARY 3 t.topic_first_post_id 1
1 SIMPLE p2 eq_ref PRIMARY,poster_id PRIMARY 3 t.topic_last_post_id 1
1 SIMPLE u2 eq_ref PRIMARY PRIMARY 3 p2.poster_id 1
Anyone ideas how to optimize this? Fun starts when query looks post_id IN ( 5000 posts_id's :P ). Then its really slow on high load.

Query is executed while getting search results showed by posts.
if ( $show_results == 'posts' )
{

$sql = "SELECT pt.post_text, pt.bbcode_uid, pt.post_subject, p.*, f.forum_id, f.forum_name, t.*, u.username, u.user_id, u.user_rank, u.user_posts, u.user_sig, u.user_sig_bbcode_uid
FROM " . FORUMS_TABLE . " f, " . TOPICS_TABLE . " t, " . USERS_TABLE . " u, " . POSTS_TABLE . " p, " . POSTS_TEXT_TABLE . " pt
WHERE p.post_id IN ($search_results)
AND pt.post_id = p.post_id
AND f.forum_id = p.forum_id
AND p.topic_id = t.topic_id
AND p.poster_id = u.user_id";
//echo "test";
}
btw. splitting huge tables to smaller ones (posts/posts_text) is still actual idea ( i mean sooner or later 'must do' ), it would speed up alter table, repair table, backup table, query'ing in some cases etc.

update:
Notice in logs another strange query which took every time more than 3 seconds:
DELETE FROM phpbb_confirm
WHERE session_id NOT IN ('0004eaa18bc281b1507ab8dcef97a41f', '0006cb66d0ef9da9b56b4b9b8ebc6271', '0007f695b0dff6f8addb8b9d345c0566', '00080aec9ddbd4a5b893e4e374d2315c', '0018bdd7194402d5022f3ae9915904fa', '001d8048dfd67b80aa0fb0bcc0ac98ab', '0021bb62fe2fa670916370cbdb20861f', '0023a8bcfc884415a4705a6ffe180be4', '002f2da25216a732d935d08b6d4c46ae', '0034d4f79a78cc3985829380ace4afc0', '00370ffa197d8d8741c526da6565bc51', '003ce16e661307011e5932f9c5bafd85', '003d5bbdb7d2ab90d4968f0a34171787', '0046016b6991f029d474e76b969d6faa', '00572345f03102dc05bb3bde5600f135', '0058db93946f80f99fb02569b1f7c7fa', '005dc595202f9b38b5bba7e804912673', '006c8249cfa305d2ac86c8f08aca3491', '0072a8f271955f523a60fe6481bf4f95',
... lots of data like this here
'ffe062eb5733680f39178e71a948a718', 'fff35d1527fbfc18d2fcea1e7ffac5ff', 'fff7b7329d68de4539b833809c4222fd');
Idea of using NOT IN is kinda stupid to begin with :P.

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow » Mon Oct 06, 2008 9:37 pm

qspypl wrote: update:
Notice in logs another strange query which took every time more than 3 seconds:
DELETE FROM phpbb_confirm
WHERE session_id NOT IN ('0004eaa18bc281b1507ab8dcef97a41f', '0006cb66d0ef9da9b56b4b9b8ebc6271', '0007f695b0dff6f8addb8b9d345c0566', '00080aec9ddbd4a5b893e4e374d2315c', '0018bdd7194402d5022f3ae9915904fa', '001d8048dfd67b80aa0fb0bcc0ac98ab', '0021bb62fe2fa670916370cbdb20861f', '0023a8bcfc884415a4705a6ffe180be4', '002f2da25216a732d935d08b6d4c46ae', '0034d4f79a78cc3985829380ace4afc0', '00370ffa197d8d8741c526da6565bc51', '003ce16e661307011e5932f9c5bafd85', '003d5bbdb7d2ab90d4968f0a34171787', '0046016b6991f029d474e76b969d6faa', '00572345f03102dc05bb3bde5600f135', '0058db93946f80f99fb02569b1f7c7fa', '005dc595202f9b38b5bba7e804912673', '006c8249cfa305d2ac86c8f08aca3491', '0072a8f271955f523a60fe6481bf4f95',
... lots of data like this here
'ffe062eb5733680f39178e71a948a718', 'fff35d1527fbfc18d2fcea1e7ffac5ff', 'fff7b7329d68de4539b833809c4222fd');
Idea of using NOT IN is kinda stupid to begin with :P.
I just read recently that someone had made a modification to change that to count by time instead, since it relies on currently active sessions and thus gets unweildly on an active forum. Here's the topic: http://www.phpbb.com/community/viewtopi ... 5#p7257265
But I can't find the code changes at all, so I'll work on that tonight.
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

User avatar
JLA
Registered User
Posts: 479
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: [2.0.x] Tweaks for large forums

Post by JLA » Thu Oct 09, 2008 5:49 pm

da_badtz_one wrote:There are some great comments getting thrown around here, thanks to Dog Cow especially for writing up that guide.

I'd like to share a tweak I made a while ago which I forgot to share.

The following query is a query you'll find when trying to find the total count of every user in your database to provide statistical information. To avoid having to rewrite the whole user counting process to feature what phpBB3 uses I used this nifty tweak I found.

So this query:

Code: Select all

	$sql = 'SELECT COUNT(user_id) AS total
		FROM ' . USERS_TABLE . '
		WHERE user_id > ' . ANONYMOUS;
becomes:

Code: Select all

	$sql = 'SELECT COUNT(user_id) -1 AS total
		FROM ' . USERS_TABLE;
The difference is noticeable as shown as the result:

Code: Select all

mysql> SELECT COUNT(user_id) -1 AS total FROM driaw_users ;
+--------+
| total  |
+--------+
| 452979 |
+--------+
1 row in set (0.38 sec)

mysql> SELECT COUNT(user_id) AS total FROM driaw_users WHERE `user_id` > -1 ;
+--------+
| total  |
+--------+
| 452979 |
+--------+
1 row in set (2.26 sec)
I still do not know the reason why it is faster, but if it returns the same result then hey why not? :)

One of the things we learn from using COUNT is that it is slow when put against large tables. The best way to keep statistical information for anything we want to count is to keep a record of it as a number in the config table instead. However this requires accuracy in making sure that the count is kept consistent with the changes made. It will be initially hard to convert everything over to use this statistical count method but atleast there is some indication to tell you whether or not you continue to accurately make these count changes. And that is to compare this method to the old method and see if there are any differences.

For example, if I deleted a user, the user count will decrease by 1.
What file is this in?

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow » Thu Oct 09, 2008 7:23 pm

JLA wrote:
da_badtz_one wrote:
The following query is a query you'll find when trying to find the total count of every user in your database to provide statistical information. To avoid having to rewrite the whole user counting process to feature what phpBB3 uses I used this nifty tweak I found.
What file is this in?
includes/functions.php

Right at the top of the file (or very near the top).
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

User avatar
JLA
Registered User
Posts: 479
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: [2.0.x] Tweaks for large forums

Post by JLA » Thu Oct 09, 2008 8:47 pm

As I was reading through this thread, there were some mentions about caching of static data (not speaking of templates - we already are using XS) such as smilies, ranks, etc.

We found the smilies and ranks code but nothing on the other types of static data. I guess the only factor that would affect this would be we are using simple subforums. Is there somewhere this code for caching of other status data is located?

On another note - big thanks to PHPBB Doctor on his writeup regarding the $nav_links and problems with tons of requests to viewforum.php by certain users. We have dealt with this problem for a long time and hope to see this kills it once and for all.

User avatar
JLA
Registered User
Posts: 479
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: [2.0.x] Tweaks for large forums

Post by JLA » Thu Oct 09, 2008 8:47 pm

Dog Cow wrote:
JLA wrote:
da_badtz_one wrote:
The following query is a query you'll find when trying to find the total count of every user in your database to provide statistical information. To avoid having to rewrite the whole user counting process to feature what phpBB3 uses I used this nifty tweak I found.
What file is this in?
includes/functions.php

Right at the top of the file (or very near the top).
Thank You

User avatar
JLA
Registered User
Posts: 479
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: [2.0.x] Tweaks for large forums

Post by JLA » Thu Oct 09, 2008 11:37 pm

Memberlist.php
-- Instead of the select statement that selects all users info, an index of user_id is first built. The hack is very similar to the view topic hack on page 1

Viewforums.php
-- Same case, a list of topic id is built before grabbing all the topic information in the big select statment with all the table joins
-- Similar to the viewtopic hack, an index with two keys (forum id and post time) is build to speed up the search when building the list of topic id index

Posting.php
-- There is a statement there to grab the most recent messages within the thread, it needs to have the same hack for viewtopic.php applied to it
-- I might be wrong, but the statement for grabbing vote id and vote description does not require the GROUP BY statement. Taking it away sped up the query from 1 minute to 1 second
Does anyone know where Lanzer actually put the code for this?

He said
Sorry for the crudeness of the information. I'll try to find time to better list all the changes once I find some free time. ^^;
Thanks

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow » Fri Oct 10, 2008 3:04 am

He didn't give code for forums and memberlist, because you can just adapt the viewtopic code. For posting, he's referring to the Topic Review which shows the latest posts. The voting thing is easy to change as well.

If you need some code, look at phpBB 3. It is using Lanzer's idea for memberlist and forum.
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

Locked

Return to “2.0.x Discussion”

Who is online

Users browsing this forum: No registered users and 5 guests