[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
R45
Registered User
Posts: 2830
Joined: Tue Nov 27, 2001 10:42 pm

Post by R45 »

cifroes wrote: @ R45

Here is the explain:

Code: Select all

mysql> explain select t.forum_id, t.topic_id, p.post_time FROM phpbb_topics t, phpbb_posts p WHERE p.post_id = t.topic_last_post_id AND p.post_time > 1109174573 AND t.topic_moved_id = 0;
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
| table | type   | possible_keys     | key     | key_len | ref                  | rows  | Extra      |
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
| t     | ALL    | topic_moved_id    | NULL    |    NULL | NULL                 | 11819 | where used |
| p     | eq_ref | PRIMARY,post_time | PRIMARY |       3 | t.topic_last_post_id |     1 | where used |
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
2 rows in set (0.00 sec)
Right now this query takes 0.2~0.5s... This is not much but it's my slowest query in index.php and when the DB is under more load it can take seconds...

The reason your query is running so slow is because MySQL isn't using an index for the topic_moved_id match. I'd recommend creating an index for that column as it is heavily referenced in various queries.
cifroes
Registered User
Posts: 108
Joined: Wed Mar 27, 2002 4:09 pm

Post by cifroes »

@R45

Hi, my phpbb_topics indexes:

Code: Select all

 Indexes :
Keyname 	Type 	Cardinality 	Action 	Field
PRIMARY 	PRIMARY 	11939  	Drop 	Edit 	topic_id
forum_id 	INDEX 	35  	Drop 	Edit 	forum_id
topic_moved_id 	INDEX 	118  	Drop 	Edit 	topic_moved_id
topic_status 	INDEX 	3  	Drop 	Edit 	topic_status
topic_type 	INDEX 	3  	Drop 	Edit 	topic_type
I don't think i changed anything in my indexes from the default phpbb and that index is there... But looking at is cardinality and thinking a bit I don't it think that is the problem, but i dont know :)

Anyway, I have topic_moved_id as a index...
lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer »

Hello Darkangel:

I can hardly remember what were the limitations around the amount of traffic you mentioned, but as things approached thousands of users online database performance seemed to be the dominating factor in performance issues.

When the database has any queries that takes more than a few seconds to complete, with just less than ten of these queries your entire site can be brought to a crawl. As web and CPU related performance degrade gradually, the degragation from slow queries can hit really quick and hard.

The simpliest ways that I troubleshoot performance issues is with "show full processlist" under MySQL. Any queries that takes too long to run needs to be optimized or the feature need to be dropped. You mentioned locked threads also. If you have any tables running on MyISAM tables then seeing locked threads is a sign that it's time to convert to InnoDB. Make sure you have enough RAM (more than a gig), add a hard drive to store the InnoDB data, and upgrade MySQL if you're running below 4.0.11. That'll be gauranteed performance increase.

I remember listing my SQL config file somewhere in this thread. Basically, you never need to give a lot of RAM per thread, because there will never be enough RAM if the queries are not optimized. A dual opteron machine can handle around 700 mysql connections before starting to slow down. A single CPU server should not be able to handle half of that is what I assume, especially for 32bit processors.

That's all that I can think of for now. Good luck! :D
d-ArkAngel wrote: It's been a while since I needed to visit this topic, but it seems I'm back again. I've started to hit a wall again, I'm at 1,800,000 Posts, and arround 118,000 users. The wall seems to be at arround 650 active users, tho turning off search has sorted it out. From what I've seen even tho I've got the super sneaky rand() % 4 cheat it's the updates of the views in the topics table that trips things up locking the tables untill everything sorts out.

This may or may not be my only problem, I suspect that running out of ram might also be an issue, tho I'm looking at a DB server update as well, I could do with some ideas on where I need to alocate ram in the my.cnf file for the server.

I've got two mods in the pipline to try and sort the limitations out, one is to archive off everything but the last 100,000 acitve post/threads/users to a near line backup in the hope that I can improve the joins and other such performance issues, and the second is to remove the updates for topic views and put it in a cron batch job instead.

Any things people have run into with ~600 users ~400,000 page views would be a great help, no doubt I'll find my way just thought I'd touch base and see what new things you've all discovered :-)
lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer »

Hi Cifroes:

The query you mentioned has a hack that will take the functionality away and replace it with a query that tells the user which forum has new messages, as opposed to listing all the new topics since the user last logged in. It was mentioned as one of the first hacks in this thread.

Otherwise, you can always do two things:

compare $userdata['user_lastvisit'] with time() and see how long has it been since the user last logged in. If it's more than a week then you might as well disable the feature as there will be too many topics to report.

You can also run a query to count the number of topics, such as

Code: Select all

select count(t.topic_id) as count FROM phpbb_topics t, phpbb_posts p WHERE p.post_id = t.topic_last_post_id AND p.post_time > 1109174573 AND t.topic_moved_id = 0
This query will report the number of topics that will be reported, but it's still a big query to run. It's something you can do but realistically it's better just to hack it so that the query doesn't run at all. :)

PS - Each SQL query can only use one index per table, so adding indexes will not help unless you're adding a compound index. But if you really really want this feature, I would index the column 'topic_last_post_id' and run two queries. One to grab the first post number that's bigger than the last visit date, then the second query would have a where clause that compares topic_last_post_id with the post_id number obtained from the first query. Hey, having 2 queries that takes 0.02 seconds to run is always better than one 0.2 second query. But again, for such a small feature I wouldn't go this far. :)
cifroes wrote: @ R45

Here is the explain:

Code: Select all

mysql> explain select t.forum_id, t.topic_id, p.post_time FROM phpbb_topics t, phpbb_posts p WHERE p.post_id = t.topic_last_post_id AND p.post_time > 1109174573 AND t.topic_moved_id = 0;
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
| table | type   | possible_keys     | key     | key_len | ref                  | rows  | Extra      |
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
| t     | ALL    | topic_moved_id    | NULL    |    NULL | NULL                 | 11819 | where used |
| p     | eq_ref | PRIMARY,post_time | PRIMARY |       3 | t.topic_last_post_id |     1 | where used |
+-------+--------+-------------------+---------+---------+----------------------+-------+------------+
2 rows in set (0.00 sec)
Right now this query takes 0.2~0.5s... This is not much but it's my slowest query in index.php and when the DB is under more load it can take seconds...


@ alphamonkey

There are some pages in phpBB that have queries with long lists of id's and stuff so now I'm not exactly sure that is a big problem...

Anyway we could add a limit to the first query or after it see the num_rows and a) just count the last X posts b) if the count is very big just mark every forum as with new messages or with no new messages... Maybe this is still an aproximation of the correct results with speed improvements, i don't know how the phpbb team feel about that...

More feedback is welcomed :)
Last edited by lanzer on Sun Feb 27, 2005 1:07 pm, edited 1 time in total.
lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer »

EverettB wrote: Has anyone tried this? I installed it a while back and just today noticed that the unserialize of the smilies data does not work. I get FALSE returned from the unserialize command.

I also noticed that there is no $board_config['time_now'] variable so I just substituted time() in there.


Hi EverettB:

Sorry $board_config['time_now'] should be time() on regular phpbb installations. While if you seem to have trouble with storing the data, you can take a look at the cache_smilies.php file to see if data is stored properly. Just scroll to the end of the line and see if you've reached a line limit and things are being cut-off.

Also, you can slip in a line that reads "var_dump($smilies);" right after smilies is populated by the sql statement just to verify the data.

If the cache file has all the data in it yet unserialize doesn't store the info, then you might be hitting a line limit. I now store the variables individually with a function instead of using serialize.

Code: Select all

$file_text = "<?\n \$savedata = " . var_export($savedata, true) . ";\n ?>";
The line that starts with $save_string can be omitted.
User avatar
EverettB
Registered User
Posts: 326
Joined: Fri Aug 01, 2003 7:11 pm
Location: North America

Post by EverettB »

lanzer wrote: If the cache file has all the data in it yet unserialize doesn't store the info, then you might be hitting a line limit. I now store the variables individually with a function instead of using serialize.

Code: Select all

$file_text = "<?\n \$savedata = " . var_export($savedata, true) . ";\n ?>";
The line that starts with $save_string can be omitted.

Ok, I didn't think about a line limit. I do have a bunch of extra smilies.

Thanks for the feedback, I ended up caching the data as individual array elements instead of serializing it. Easy change.

I see var_export is easier than what I did though, thanks.
Selven
Registered User
Posts: 291
Joined: Fri Nov 08, 2002 7:28 am

Post by Selven »

there are an clear mod with all code changes to apply for optimizing search function?
No-more supporting phpBB
kjcdude
Registered User
Posts: 154
Joined: Mon Sep 27, 2004 10:23 pm
Location: Southern California
Contact:

Post by kjcdude »

Dam i cant imagine how many hours upon hours it took you to modify your board...
At first i didnt even think it was a phpbb board.

Good job man!
alphamonkey
Registered User
Posts: 146
Joined: Sat Mar 01, 2003 8:26 am
Location: 0x00

Post by alphamonkey »

Selven wrote: there are an clear mod with all code changes to apply for optimizing search function?
Not that Im aware of.
gulson
Registered User
Posts: 15
Joined: Mon May 13, 2002 8:45 pm

Post by gulson »

anyone know a possibility how to stop these larger queries generated by search engine:
SELECT p.topic_id
FROM phpbb_posts p, phpbb_forums f
WHERE p.post_id
IN ( 319212, 350828, 319337, 319351, 319401, 319487, 319505, 319535, 406875, 319618, 350117, 319789, 319822, 319828, 319832, 349383, 319883, 319899, 319922, 319934, 319951, 319952, 322980, 320002, 320007, 320037, 320030, 320063, 320384, 320266, 320298, 320309, 320358, 320371, 320390, 320408, 320409, 326419, 320425, 320437, 328133, 320451, 320508, 320512, 320538, 355645, 320557, 320633, 320651, 320665, 320701, 320707, 380301, 326020, 320774, 320846, 320857, 320898, 321419, 320920, 320994, 325367, 321014, 321032, 321068, 349326, 321091, 323668, 326105, 322037, 349393, 322133, 322136, 322140, 322141, 322194, 322196, 322202, 322211, 322303, 322234, 322332, 322339, 322352, 322359, 325221, 322391, 322491, 322506, 322546, 322572, 322581, 322636, 322669, 322734, 325149, 325521, 323047, 323059, 323087, 323090, 323111, 323121, 323196, 323204, 323207, 323261, 323281, 323501, 324142, 323597, 323690,


this can be timeout page, server overloaded anything to stop these queries

maybe

if (count($search_id_chunks) > 1000) {message_die(GENERAL_MESSAGE, 'Too large query'); exit;}
User avatar
xkevinx
Registered User
Posts: 132
Joined: Tue Nov 05, 2002 8:45 pm
Location: California
Contact:

Post by xkevinx »

Hi again Lanzer,
I was wondering how your new session handeling works. What changes you made. And did you ever get memcache working with your system.
Do you Believe?
DmcMan
Registered User
Posts: 48
Joined: Sat Jan 01, 2005 5:23 am

backing up your database

Post by DmcMan »

How do you guys with these huge databases back them up? For security, aren't you supposed to back them up daily and obviously, you can't download a 1+ gig database everyday. Do you do daily tape backups or something of the sort?

Just curious.
alphamonkey
Registered User
Posts: 146
Joined: Sat Mar 01, 2003 8:26 am
Location: 0x00

Re: backing up your database

Post by alphamonkey »

DmcMan wrote: How do you guys with these huge databases back them up? For security, aren't you supposed to back them up daily and obviously, you can't download a 1+ gig database everyday. Do you do daily tape backups or something of the sort?

Just curious.
I back up off the server, not off my client. So, I backup to a tape drive.

If you dont have that luxury, 1GB is not too bad of a transfer if you schedule it at a down time (if your forum has one) :wink:
lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer »

Hi Kevin:

We tried using memcache, and although memcache is fast handling single connections, it actually didn't show a great improvement as a session handler. With a high amount of load we actually started having connection errors even with load distributed amoung servers.

Another problem with memcache was that it couldn't prune by a certain field value in one single command, nor can you select multiple rows of data for, say, showing how many users you have on-line by counting number of rows of data in the memcache server.

We ended up using 3 different MySQL servers with nothing but a HEAP table to carry the session information that's distributed evenly based on user_id. (we don't keep sessions for anonymous users anymore as there is no need for it) The session tables had also been changed a lot because of our site's sepcial need, such as keeping track of gold granting when users browse our site.
xkevinx wrote: Hi again Lanzer,
I was wondering how your new session handeling works. What changes you made. And did you ever get memcache working with your system.
lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Re: backing up your database

Post by lanzer »

We do nightly backup with mysqldump for most of our databases. A script was made to list all the tables within the database, then each table will be dumped into a backup file by the same name. (you can also just tell mysqldump to dump all tables into one big file) This happens on all our database servers. The files are then compressed into tar files and multiple copies are saved in a backup directory by another script. Every week they're copied to an off-site location. This is an inexpensive way of back up that will work for almost all sites out there.

Exception being the forum database which is simply too big to be backed up. The forum database is at 100G and the archive database is just as big. They rely on their slave servers as the backup, but we're still prone to human errors. If someone commit a delete on the forum both the master and the slave will lose all their data! 8O

We're in the middle of installing a storage area network this week which will give us plenty of space and the ability to actually take snapshots for backups.
DmcMan wrote: How do you guys with these huge databases back them up? For security, aren't you supposed to back them up daily and obviously, you can't download a 1+ gig database everyday. Do you do daily tape backups or something of the sort?

Just curious.
Locked

Return to “2.0.x Discussion”