[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl »

Thx Dog Cow for this session NOT IN hack.
Last edited by qspypl on Thu Dec 11, 2008 12:41 am, edited 1 time in total.
da_badtz_one
Registered User
Posts: 376
Joined: Thu Jan 29, 2004 8:25 pm

Re: [2.0.x] Tweaks for large forums

Post by da_badtz_one »

@qspypl: I'm surprised adding distinct will actually speed things up. DISTINCT is infact a GROUP BY in most cases. I guess one way you can avoid using DISTINCT or GROUP BY all together is to take a 100 max result limit instead and find the first 30 unique topic ids. You can use array_unqiue or even in_array to check if the topic ids are in there. And if not simply query another 100 to find more unique topic ids.
User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl »

da_badtz_one i read couple articles about group by vs distinct after ur post and it looks like there should be no diffirence like you said but there is in filesort.

Code: Select all

mysql> explain SELECT DISTINCT sql_no_cache topic_id                      FROM phpbb_posts                      WHERE post_id IN ( 5204899, 5204820, 5156469, 5204521);
+----+-------------+-------------+-------+---------------+---------+---------+------+------+------------------------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra                        |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+------------------------------+
|  1 | SIMPLE      | phpbb_posts | range | PRIMARY       | PRIMARY | 3       | NULL |    4 | Using where; Using temporary |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+------------------------------+
vs

Code: Select all

mysql> explain SELECT sql_no_cache topic_id                      FROM phpbb_posts                      WHERE post_id IN ( 5204899, 5204820, 5156469, 5204521) group by topic_id;
+----+-------------+-------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra                                        |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | phpbb_posts | range | PRIMARY       | PRIMARY | 3       | NULL |    4 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
Group by involving filesort, distinct not.

update5:
finally rewrited 'move' function from my earlier concept in this topic, working and its hundreds time faster :)
in mark topics function removed query which was getting last post_time
update6:
also rewrited 'split' function same concept like in 'move', XX times faster.
User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl »

Ok this is conception only now, but result would be instant access to every topic page/posts page on forum instead of slow pagination by LIMIT. We know that if limit become 10000, 40 its extremely slow especially on high load.

How change LIMIT? Lets take for example topics. We must add two fields to topics_table, topic_page and topic_count (which is described later). Now run simple script which filling topic_page field like this:

Code: Select all

topic_id:      1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
topic_page:    1,1,1,1,1,2,2,2,2, 2, 3, 3,3,3, 3, 4
This topic_page numeration is correct when you have board_config['topics_per_page'] = 5.

Then instead of main viewforum query which is:

Code: Select all

        WHERE t.forum_id = $forum_id
                AND t.topic_poster = u.user_id
                AND p.post_id = t.topic_first_post_id
                AND p2.post_id = t.topic_last_post_id
                AND u2.user_id = p2.poster_id
                AND t.topic_type <> " . POST_ANNOUNCE . "
        ORDER BY t.topic_type DESC, t.topic_last_post_id DESC
        LIMIT $start, ".$board_config['topics_per_page'];
we have

Code: Select all

        WHERE t.forum_id = $forum_id
                AND t.topic_page = $page_id   <-- change here
                AND t.topic_poster = u.user_id
                AND p.post_id = t.topic_first_post_id
                AND p2.post_id = t.topic_last_post_id
                AND u2.user_id = p2.poster_id
                AND t.topic_type <> " . POST_ANNOUNCE . "
        ORDER BY t.topic_type DESC, t.topic_last_post_id DESC
          <-- change here, cutted out LIMIT -->
With compound index on forum_id and page_id grabbing all topics have same speed on page 1 and page 999999.
But... not so simple to achieve it, first of all, new topics must somehow know which page_id write to self when created. This is when topic_count come to play, instead of page_id stored in it, it has always numbers from 1 to $board_config['topics_per_page'] which is our 5 right now. So if new topic is created it grabs max(topic_id) and if topic_count is 4 it write same page_id to self like grabbed record, if it is 1 it write same page_id to self but if it is 5 it write page_id +1.
Then ofcourse update topic_count.

So we have 2 queries more on topic create (select and update) but extremely fast viewing even each of 10.000.000 topics in one specific forum. Great on forums where SELECT rate is around 70-80%.

Some problems could arise when we want move topics from forum to forum. Then topic should wait to global resync of page_id at night or find other solution.

Im finding new problems when im writing it, because what if someone want post in topic which is on page 2, then topic would not come to page 1 (refreshed normally would go to top of first page), it would be just on top of page 2.

Whole idea comes from Lanzer, but he was using similiar method to archive posts to separate table (yes we missed that when we discussed spliting tables). Too bad he not mentioned more about it, its called paritioning or sharding which is not simple task involving much code and database schema changes, but its a must later on.
User avatar
Dog Cow
Registered User
Posts: 2507
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow »

I thought of this a few months ago, but I think it's impractical. Posts in a topic never change position, but topics in a forum do, since you want the ones with the latest replies always on top. Every time you posted a reply, or a new topic, near-on every single topic in that forum would have to be updated. I think Gaia is using some sort of LIMIT on their viewforum page.

Compare this:
http://www.gaiaonline.com/f.10/?sequence=1
to this
http://www.gaiaonline.com/f.10/?sequence=2

Just one little difference between the two! The first link is showing the newest topics, up to 40. The second is showing topics from the 2nd newest onward, plus one more topic (#41) which would have been on page two of the first link I posted. You can also see by watching the "Topics 4 - 43 of 694" text, near the pagination links.

(Gaia missed some input validation, though. Try specifying a negative number. It just adds 39 to it.)
User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl »

http://www.phpbb.com/community/viewtopi ... 4#p1114684

Above orginal lanzer post about archiving posts by this method.
I can't reproduce sequence=1/sequence=2. You could think this is 2nd new topics but there every refresh page is whole page of new topics (XX topics refreshed per second).

I have in viewforum:
@ 1st. page: http://www.gaiaonline.com/forum/announc ... sequence=1
@ 2st. page: http://www.gaiaonline.com/forum/announc ... equence=41
So probably normal LIMIT usage with 40 topics per page.

And in viewtopic:
@ 1st. page: http://www.gaiaonline.com/forum/nameoft ... 1159569_1/
@ 2st. page: http://www.gaiaonline.com/forum/nameoft ... 159569_16/
So probably normal LIMIT usage with 15 posts per page.

Then something else must give so fast every page viewing. Maybe hes cache'ing by Memcache old posts/topics. When they are rarely refreshed cache could last long and allowing instant reads (love memcache). But it can be various things, like another database engine, MyISAM so diffirent from InnoDB but in this cache MyISAM is even faster from what i tested.

My only method for now is throwing queries with LIMIT > XXX to slave but they are still slow there ;).
If someone from here using memcache can he share what queries and for how long he cache? We can cache index main query refreshed every 20 seconds for example, viewforum main query refreshed every 10 seconds, get_db_stat in user_viewprofile for user percentage posts for 24 hours, but what else? Authorization is user_id based so a lot wasted memory to cache for ex. 500.000 queries (each for every user) especially when this query is instant. What we cant cache is viewtopic newest pages because user can complain why his posts isn't showing up after submit (cache refresh every 10 sec. for ex.). Advices welcome.

Memcache'ing userdata grab would give great benefits because its grabbed on every page but im not sure yet if it would not let to data corruptions. If somewhere is smth like:
UPDATE phpbb_users set user_posts = '.$userdata['user_posts'].' +1 where user_id = XXX <--- DATA CORRUPTION because we have old $userdata['user_posts'] from cache
UPDATE phpbb_users set user_posts = user_posts +1 where user_id = XXX <--- no coruption
User avatar
Dog Cow
Registered User
Posts: 2507
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Dog Cow »

They're definitely using caching, I know that, as it's been stated by the developers. Lanzer no longer does any in-depth programming work for Gaia, so he's kind of "useless" at this point. I do know that Gaia's forum before used a page ID for posts in a topic, but now they're using a LIMIT again (or something related). One developer (or maybe Lanzer, I don't remember) mentioned that they wanted to give the option to users to set the number of posts per page, and perhpaps even topics per page in a forum.

What I imagine, is that they have a finite number of replicated database servers, but many more web servers. The last figure given for web servers, I believe was around 300. What they could be doing is using a flat-file cache on each web server. Right, you can't guarantee successive requests are going to go to the same server, but if they all have roughly the same cache (and keep in mind they're all being accessed more or less simultaneously) then that would take off a lot of queries from the database server(s). If not that, then memcached is in use. They also use a distrbuted filesystem called MogleFS.
User avatar
qspypl
Registered User
Posts: 149
Joined: Wed Dec 20, 2006 7:15 pm

Re: [2.0.x] Tweaks for large forums

Post by qspypl »

Dog Cow great informations, especially MogleFS.

Well i know why they are still using limit, its all about proper cache indeed, but not same cache for every page. Other solutions are too complicated and this is simpliest which need better cache only.

Memcache is distributed cache which mean we can run deamon on 10 machines with 2 GB ram each and we have access to 20 GB memory cache from memcache.

Depending on $start value we can make bigger $cached_seconds on further pages in memcache functions. So if this is page 1 (start = 0) its cached only for 10 seconds, same for lets say $start <= 300 because its fast retrieval without cache anyways.
Fun part starts if $start > 300 && $start <= 800 then $cached_seconds = 3600, else $cached_seconds = 86400.

What effect it will have for sql server? No load for viewing any further pages because all results are stored in memory already.
What effect in will have on users? Not any noticable, because if user write in topic on page 500 it will come to first page and stay in same time on page 500 (already cached for 24 hours). So till cache will expire topic will be in two places, refreshed on first pages and on page before refresh which is 500. So what? ;) after 24 hours all will be reordered properly and users which visits further pages are happy anyways.

Finally we are ready to allow viewing every page in forums.

Simpliest solutions are best.
some.lin
Registered User
Posts: 42
Joined: Thu Dec 28, 2006 2:46 pm
Location: Taiwan
Contact:

Re: [2.0.x] Tweaks for large forums

Post by some.lin »

Here, I also did a lot of tweaks for my website,
If you want to put online status in ervery pages, my suggestion is to modify page_header.php

Use this code section to replace original code for calculating online people.

You need to creat an Index for session_time before you use it.

Code: Select all

	
$sql = "SELECT u.username, u.user_id, u.user_allow_viewonline, u.user_level, s.session_logged_in, s.session_ip FROM ".USERS_TABLE." u, ".SESSIONS_TABLE." s WHERE u.user_id = s.session_user_id AND s.session_time BETWEEN " . ( time() - 300 ) . " AND " . (time()) . " ORDER BY u.username ASC, s.session_ip ASC";
religion holy bible - blog pcdiy
User avatar
Marshalrusty
Project Manager
Project Manager
Posts: 29334
Joined: Mon Nov 22, 2004 10:45 pm
Location: New York City
Name: Yuriy Rusko
Contact:

Re: [2.0.x] Tweaks for large forums

Post by Marshalrusty »

Moved to the 2.0.x Discussion forum as phpBB2 is no longer supported.
🇺🇦 Made in Ukraine, exported to the USA 🇺🇸

Have comments/praise/complaints/suggestions? Please feel free to PM me.

Need private help? Hire me for all your phpBB and web development needs
Locked

Return to “2.0.x Discussion”