Scaling phpbb beyond two or three servers

This is an archive of the phpBB 2.0.x support forum. Support for phpBB2 has now ended.
Forum rules
Following phpBB2's EoL, this forum is now archived for reference purposes only.
Please see the following announcement for more information: viewtopic.php?f=14&t=1385785
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Scaling phpbb beyond two or three servers

Post by aurel42 »

Hi there,

I'm currently evaluating free and almost-free web based discussion boards and phpBB is among the top contenders.

I have to set up a system for boards that do 10 mio. page impressions and more per month, which might be beyond the usual setup with one database backend and a couple of web server frontends.

I'm considering a setup with a MySQL backend and webservers with replicated MySQLs on the frontend servers. For this to work efficiently, the application needs to direct all read-only SQL requests to the replicated database on localhost (using sockets), while all other requests go to the master database backend (that keeps the number of networked connections low).

For applications with a read/write ratio similar to a BB, I have up to a dozen high-load, high-traffic frontends powered by one master database, without any load or traffic issues in the backend.

I have *not* checked the source of phpBB or phpBB2 yet, but perhaps you have some insights to share about how difficult this would be to implement in either version, how it should be done so the changes can be contributed to the project, or on alternative methods of scaling.

I'm open for all suggestions.

Thanks for your help,
Marc
Last edited by aurel42 on Thu Jan 03, 2002 8:48 pm, edited 1 time in total.
hsim
Registered User
Posts: 1554
Joined: Tue Oct 23, 2001 9:39 pm
Contact:

Post by hsim »

maybe you could redirect SELECT queries by hacking the DBAL. Before executing a query check whether it starts with SELECT and do some dirty coding to connect to localhost in this case.
email me: hsim at gmx.li
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Post by aurel42 »

hsim wrote: maybe you could redirect SELECT queries by hacking the DBAL. Before executing a query check whether it starts with SELECT and do some dirty coding to connect to localhost in this case.


If possible, I would like to avoid code that would be too dirty to go back to the main source tree... as I said, I'm open for alternative scaling methods, but if none can be found, I hope I can encourage the phpBB developers to consider adding support for my kind of setup to the official code base.

Thanks for your comment,
Marc
hsim
Registered User
Posts: 1554
Joined: Tue Oct 23, 2001 9:39 pm
Contact:

Post by hsim »

maybe simply create your own DB driver which does the same as suggested above but with clean code :)
email me: hsim at gmx.li
xshredx
Registered User
Posts: 51
Joined: Fri Nov 09, 2001 3:05 pm

Scaling phpbb beyond two or three servers

Post by xshredx »

aurel42 wrote: I have to set up a system for boards that do 10 mio. page impressions and more per month, which might be beyond the usual setup with one database backend and a couple of web server frontends.



For applications with a read/write ratio similar to a BB, I have up to a dozen high-load, high-traffic frontends powered by one master database, without any load or traffic issues in the backend.


what, you're doing the new forum for slashdot or something??? :wink:

best is too forget about using multiple database servers with the current code, or you'll have to code some stuff yourself... anyway, that's no biggie, because i know no php forum that let's you do that with the default package...
use a database that scales well under impressive load, like sybase or oracle or informix or something...
and run the database server on a big machine,... replication is nice, but think first about fallback when main machine fails...
front-end servers is no problem... go for big memory... there are multiple load-balancing solutions that will fit this software and whatever other php software...
but bear in mind that phpbb is heavy on the server... the template-system is all nice and stuff, but heavy... maybe you can use some of your money to rewrite pieces of phpbb to use at least compiled templates... some more caching in the app itself would be cool too...

anyway... tell us what site it is when you're ready... then i'll gonna take a look at how you handled it.. sounds like a nice job
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Scaling phpbb beyond two or three servers

Post by aurel42 »

Thanks for your input, xshredx, I wish it were /. or something cool, but it's just the usual game and anime sites. They can grow pretty big.

I don't have a problem coding what I need myself, but I'm asking for advice how to do it without damaging the database abstraction layer beyond recognition.

Regarding your suggestion to use a single big database backend: that's not an option for several reasons, among them being company logistics and cost efficiency (I can get a dozen of the high-end SMP Pentium servers that we're usually using with less hassle and cheaper than one big Sun or IBM machine). Another reason is my experience with those database monsters: a) I have too little first hand experience, b) my second hand experience (ie. watching the Sybase and Oracle admins I work with) tells me not to touch those beasts, because they promise a lot, cost more, and deliver only on Tuesdays and full moon.

I don't know the history of phpBB, but wasn't it written with MySQL as primary database backend? If so, it doesn't use any of the features that MySQL is lacking. Those lacking features are the reason why MySQL is faster than "real" (feature-complete) SQL databases. So why should I use a database backend that is slower because it has features that are not needed for the application I'm going to run?

Sorry, but you got me ranting there. I'm working in an environment that hosts a couple of different database backend solutions, some of them commercial, some of them open source. And everytime I see that our flag ship service is down again, because the big, ugly, non-scaling commercial database backend couldn't take the load anymore, I'm happy that all my projects are driven by open source databases and that the application developers would take a little extra work to make my life easier and my services more stable.

Re: Fallback... replication is a nice way to have redundancy (at least in the frontends), share load, minimize network load and connections (they're expensive compared to sockets) ... and when it comes to a failure in the backend, it's only a couple of steps to make one of the frontends the new backend... if needed, I could use replication for a hot standby machine, but actually my application doesn't need that (remember, game and anime fan sites). The system doesn't have to be up 100% of the time. It just has to rock (ie. be fast) for 98% of the time.

Re: frontend CPU load, frontend load balancing etc. - the backend is my only worry, I'm kinda experienced in setting up frontends with multiple servers.

Re: money? All I can offer are my humble services, or rather, a percentage of my services. It's kind of a pet project of mine - the owners of the hosted sites will hate me for forcing them to give up their boards and my boss would hate me for wasting my time with this if he knew, but what's the use of being BOFH if you can't force people to do what's good for them?

Thanks again for your advice, I hope I managed to clear things up a bit,
Marc
Last edited by aurel42 on Mon Jan 21, 2002 2:27 am, edited 1 time in total.
hsim
Registered User
Posts: 1554
Joined: Tue Oct 23, 2001 9:39 pm
Contact:

Post by hsim »

you should really try writing your own database class, maybe that's everything you need to do.
email me: hsim at gmx.li
theFinn
Founder and ex-Contributor
Posts: 1767
Joined: Tue Jul 03, 2001 7:58 pm
Location: Edmonton, AB, Canada
Contact:

Post by theFinn »

Using multipal DB servers on the backend could be acomplished, as others have said, by creating your own DBAL class. This way you would avoid 'damaging' any of our code and still get what you want. I don't know if this would be an addition to phpBB that we would want to make as it would be used only in a few cases. However, it would be something we would like to have in our mods database.

I, personally, would go with the multipal webserver/single database server idea (with a fallover backup server of course). One monster DB server and a highly tweeked MySQL process should be all you need, I belive it is all Slashdot uses. With this setup no modifications to phpBB are needed.

You might also want to look into PostgreSQL as it seems to be quite good at handling large amounts of transactions, and phpBB on Postgres uses less queries at some points because we've taken advantage of some of the features the other DBMS's have that MySQL does not whereever possible.

Hope this helps.
James 'theFinn' Atkinson
Founder & ex-Contributor
http://www.thefinn.net
xshredx
Registered User
Posts: 51
Joined: Fri Nov 09, 2001 3:05 pm

Post by xshredx »

well, as a start you could maybe use different database servers for different tasks... for instance one (or more in mysql replication build-up) for your main website database content, one or more for your forum content, one for your webshop for instance, one for... and so on... that could already spread some of the stuff.
now about replication... the finn suggests postgresql... that's cool, and yes it's better at handling large amounts of transactions, but replication in postgresql is even less far coded than in mysql (in mysql in works well now we can say), and there is no replication in sapdb (these are the three open-source free main database servers?)

mysql supports best one-way replication... so you can make a cluster of database servers. all updates go to one master server, which gives all the updates to the slave servers. it's also possible to daisy-chain the database servers (slave is master for another slave), and to have circular replication (every server in the ring is as well a slave as a master)... but this last one only works if you do no inserts on tables with auto-increment columns, since mysql knows no logic to hand out auto-increments on more than one master... so this one falls of for phpbb i think.

so in the case of one-way replication, the bottle-neck is the speed with which the master database server can handle the inserts and the update queries... so, splitting up your whole website (forum, news, reviews, and i don't know what else you gonna have) over more than one mysql setup (be it single servers, or replicated ones) sounds better to me...

even than, when your master server would be down, you still could have a site, but read-only than...

redundancy is quite easy to set up with some extra lines in the connect scripts... but you need to know that php's mysql connect function seems to behave strange with dead servers that should exist. sometimes those scripts seem to hang infinitely. you can solve that problem when it would arise with a check with fsocksopen on the port of the mysql server

make also sure that the slaves connect to the master under a user that only has read-rights on the replication tables, so the slaves never accidentely make changes...

nice thing to know is that mysql 4 seems to become quite cool for replication... i don't know if you already want to take the risk to run mysql 4, but mysql 4 supports failsafe replication (whereby a slave automatically takes over the task of the master when the master is down)

the biggest problem seems to be that for phpbb the selects and update/insert queries are together in scripts... i mean, if you have scripts that only either do selects, or updates/inserts, than connecting to the right server is dead easy...
now, you'll have to do quite some work i guess on the database class... checking what queries happen, and sending them to the right server... that's why multiple database-setups for your website sounds better, than one master and multiple slaves for your whole website. for a forum package (be it phpbb or any other, because i know no php forum packages that are build with replication in mind) it becomes more difficult, since for instance when you select some thread, you also have updates for instance for the number of views of a thread... but maybe that with checking the query with regexes, and than sending it via the connection it needs to go (master or slaves)... but than again... load-balancing over the slaves will be something that's most up to you, since no real support exists for that with php... maybe you could do this with hand-set rules where you choose which percentage of the select queries goes to which server... but otherwise you could check out a solution with LVS or so...

but you will likely have much more reads than updates/inserts, so yes, maybe that replication on relatively simple servers can be a good choice for you, instead of going with one huge badass database server... but once again, if i were you, i would try to split some things up, and than check at that point for replication... that could also mean that for instance when your master database server for your forum database-server-replication-setup is down (and thus, at best your forum becomes read-only), the rest of your site still could be functioning as normal, since it runs on other database servers...

anyway, i say it again... sounds like a nice job to work on sites that get so much traffic... send us the address when it's up, will you??
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Post by aurel42 »

xshredx and TheFinn, thanks. Very good and insightful stuff in those postings...

re: splitting functionality: yes, that's exactly what I'm trying to do. I have several dozen hosted sites on a handful of servers, all of them using different software. I'm trying to "move" the forums of those site from those hosting servers to dedicated forum servers. The reason: it's virtually impossible to scale any site efficiently without touching the code the site is running on and I don't have the resources to optimize each site individually.

For most sites, the forum is the piece of the pie that creates by far most of the load. Once the biggest forums have been moved to a forum cluster, I will not have any load problems on the hosting servers anymore (and, if I do it right, all load problems of the forum cluster can be solved by throwing another frontend machine into the cluster).
xshredx wrote: mysql supports best one-way replication... so you can make a cluster of database servers. all updates go to one master server, which gives all the updates to the slave servers.


Yup, and if those slave servers are running on the same machines as the frontends, most of the database connections can be handled locally (via sockets), which saves a lot of resources compared to networked database connections.
xshredx wrote: so in the case of one-way replication, the bottle-neck is the speed with which the master database server can handle the inserts and the update queries...


Correct. As I said, I'm already running a couple of applications with this setup (e.g. singles.freenet.de), and from that experience, I estimate that one backend will be able to power at least half a dozen frontends (apache + MySQL) before hitting the backend bottleneck (and I'm trying to be conservative here).
xshredx wrote: even than, when your master server would be down, you still could have a site, but read-only than...


Theoretically, yes. In the real world, the application would have to handle the missing master backend gracefully.

OTOH, redundancy of the backend is not an issue, as downtimes of up to 2hrs are okay and within 2hrs, I can easily have a new machine installed or modify an existing frontend machine to be the new backend.
xshredx wrote: load-balancing over the slaves will be something that's most up to you, since no real support exists for that with php... maybe you could do this with hand-set rules where you choose which percentage of the select queries goes to which server...


Load balancing for the slaves happens automagically as a result of the load balancing for HTTP connections (and it doesn't matter whether that's done with a DNS RR or with some load balancer hardware). Since the HTTP requests are shared between all frontends and each frontend database only handles the connections from the locally running apache processes, the database (read) load distributes pretty evenly.

In case you wonder whether it's smart to mix apache and MySQL on one server when aiming for high performance: I did not believe it before actually testing that setup, but apache and MySQL seem to develop synergetic effects when running on the same machine. Okay, that's an euphemism for this oversimplification: apache/PHP demands network and CPU, while MySQL is hungry for memory and I/O.
anyway, i say it again... sounds like a nice job to work on sites that get so much traffic... send us the address when it's up, will you??


I will.

I would be glad to hear your opinion on specific points:
  • What version to work on? I think I should go for phpBB2, since I'm not in a hurry and my impression is that there has been quite some work on the database code between phpBB and phpBB2.
  • Where should I put my modifications? It seems my setup is so exotic that there are few chances of my changes and additions going to the main source tree. So I should probably limit all my changes to a new database driver, even if that means "dirty" or suboptimal code - otherwise, it would be too expensive to maintain the changes over new versions.
  • TheFinn, you mentioned that database drivers for other SQL databases make use of functionality that is beyond MySQL. Is there any experience with those drivers in high-load situations? Are there performance reasons to evaluate other databases? (I mean, we're not talking about money or nuclear reactors here, so there's no real need for transactions beyond avoiding corruption of the database, right?)
Again, thanks for your input, all of it is appreciated (except for that "big commercial database backend" idea, which kind of defeats my point of showing that Open Source rules *g*).

I guess I'll try to get a proof-of-concept installation running within the next couple of weeks. It's pretty easy to setup with the existing code, since I can start by using the master backend for all database operations and then move reads to localhosts as I see fit.

One last remark: xshredx, you mentioned /. before... I guess my real goal is to build a forum that cannot be slashdotted. *g*

Cheers,
Marc
xshredx
Registered User
Posts: 51
Joined: Fri Nov 09, 2001 3:05 pm

Post by xshredx »

aurel42 wrote:
xshredx wrote: load-balancing over the slaves will be something that's most up to you, since no real support exists for that with php... maybe you could do this with hand-set rules where you choose which percentage of the select queries goes to which server...


Load balancing for the slaves happens automagically as a result of the load balancing for HTTP connections (and it doesn't matter whether that's done with a DNS RR or with some load balancer hardware). Since the HTTP requests are shared between all frontends and each frontend database only handles the connections from the locally running apache processes, the database (read) load distributes pretty evenly.

yeah, i did not realise that you were really going for apache frontends and mysql slaves on the same machines... sounds like a good setup, and yes, selects should be super fast this way...

aurel42 wrote: I would be glad to hear your opinion on specific points:


[*] What version to work on? I think I should go for phpBB2, since I'm not in a hurry and my impression is that there has been quite some work on the database code between phpBB and phpBB2.

definitely phpbb2... you're not in a hurry, but still: phpbb2 comes close to final... but most of all: phpbb 1.4 code is a mess, compared to phpbb2 (i think i can safely say so, even developers will agree with me here i think)... it would be much more difficult to hack it in previous phpbb versions, and phpbb 2 is build much better, allows for more customisation too, and the codebase is much cleaner...
aurel42 wrote: [*] Where should I put my modifications? It seems my setup is so exotic that there are few chances of my changes and additions going to the main source tree. So I should probably limit all my changes to a new database driver, even if that means "dirty" or suboptimal code - otherwise, it would be too expensive to maintain the changes over new versions.

i think you could do it with only modifications to the database driver... just make 2 connections (one local for selects, and one to master for updates/inserts) and check for the queries which side you have to send them)...
the codebase is clean enough to allow for this kind of thing... just make your 2 connections in the constructor of the sql_db class (where now only one is made), and in the sql_query method of that class, you could use some regex'es to check what connection it needs to be send to... if select is in the query one connection, if update/insert are in the query, you use the other db_connect_id...
aurel42 wrote: [*] TheFinn, you mentioned that database drivers for other SQL databases make use of functionality that is beyond MySQL. Is there any experience with those drivers in high-load situations? Are there performance reasons to evaluate other databases? (I mean, we're not talking about money or nuclear reactors here, so there's no real need for transactions beyond avoiding corruption of the database, right?)

well, since you want to go with open source solutions: you have three possible database servers i think: mysql, postgresql and sapdb (these are three best known open source database servers i think)... there is no sapdb support in phpbb2 (maybe you could write your own database driver if you wanted, but sapdb does not support replication to my knowledge), than there is postgresql (same here: replication is not yet very advanced, you cannot find much info about it, and it still seems to be quite buggy... yes postgresql has good name for in high-load situations, and provides more functionality than mysql... but was your whole point about this replication setup not that you could load-balance the stuff?? so postgresql better abilities of performing under high load seem to be not so super important anymore???)
so, we end up with mysql... replication works quite fine under mysql... and mysql 4 seems to become very nice for such settups... you can use the transaction supporting mysql 4 class for phpbb2, that should avoid corruption of the database...

basically it comes down to: write your own database driver for phpbb2, and stuff should work... add the multiple database (master - slave) support to the drivers and check for transactions code, and you have a working test case... than you still would have to go through the code to check some sql queries i think, but no biggie...
aurel42 wrote: Again, thanks for your input, all of it is appreciated (except for that "big commercial database backend" idea, which kind of defeats my point of showing that Open Source rules *g*).

I guess I'll try to get a proof-of-concept installation running within the next couple of weeks. It's pretty easy to setup with the existing code, since I can start by using the master backend for all database operations and then move reads to localhosts as I see fit.

One last remark: xshredx, you mentioned /. before... I guess my real goal is to build a forum that cannot be slashdotted. *g*

Cheers,
Marc

hehe, i wish i had paid more attention in my german classes... i checked your site, all looks good and nice, but i don't understand a thing of what goes on over there... damn, as a belgian dude, i should have known a bit german, so i could help getting your server slashdotted :)
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Post by aurel42 »

xshredx wrote: i think you could do it with only modifications to the database driver... just make 2 connections (one local for selects, and one to master for updates/inserts) and check for the queries which side you have to send them)...


That's one of the tricky points - I need to avoid opening a database connection to the backend unless it is needed or I will run into a "simultaneous connections" bottleneck (which can manifest itself in a couple of ways, like hitting "max_connections" or heavy swapping). For the same reason, I cannot use persistent connections to the database backend, because lots of open but idling connections are more expensive with mysql than a lot of short connections, even counting the overhead for opening a new connection.

So my approach would be not to touch the constructor and configure it open a socket to the localhost database;
identifying the methods that might do writes and add code there to handle all the operation using a networked connection to the master db (no, I'm not sure how/where to add that code gracefully, but I guess I'll find out);
adding more code to methods that do reads *and* writes to identify the specific operation and decide at run-time what db connection to use;
find spots in the application that might be problematic in this setup (e.g. places where reads occur immediately (ie. in the same request) after writes, possibly before the writes have been propagated to the localhost db) and eliminating those redundant reads
xshredx wrote: there is postgresql (same here: replication is not yet very advanced, you cannot find much info about it, and it still seems to be quite buggy... yes postgresql has good name for in high-load situations


From what I've heard from a development team in-house, load tests on PostgreSQL could not verify that statement (and I'm trying to be polite here). But then, they were testing for a specific application...
xshredx wrote: hehe, i wish i had paid more attention in my german classes... i checked your site, all looks good and nice, but i don't understand a thing of what goes on over there...


Just to clarify: singles.freenet.de is one of the systems already using my proposed setup with a modified application. It is *not* the system that I want to replace with phpBB2.

Thanks,
Marc
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Proof of concept installation is working

Post by aurel42 »

That was much easier than I thought, at least the first step. I copied mysql.php to mysql-replicated.php, added that to db.php, added all handling for selecting the database to use and opening the master database connection to sql_query... and it seems to work.

I'll start a very small beta test with my hostees now. I'll let you know how I'm doing.

Cheers,
Marc
User avatar
tilttek
Registered User
Posts: 57
Joined: Thu Nov 01, 2001 5:19 pm
Contact:

Proof of concept installation is working

Post by tilttek »

aurel42 wrote: That was much easier than I thought, at least the first step. I copied mysql.php to mysql-replicated.php, added that to db.php, added all handling for selecting the database to use and opening the master database connection to sql_query... and it seems to work.

I'll start a very small beta test with my hostees now. I'll let you know how I'm doing.


When your finish, maybe you will "share" this with us?

For now phpBB2 work well with Many Web Server to one Database Server. And yes, this usualy is a good solution. Because "usually" the database server can support the trafic generated by the web server. Generally 3 to 1 servers.

But maybe your solution can add to scallability. But is it the best solution?
aurel42
Registered User
Posts: 83
Joined: Thu Jan 03, 2002 7:48 pm
Location: Germany
Contact:

Proof of concept installation is working

Post by aurel42 »

tilttek wrote: But maybe your solution can add to scallability. But is it the best solution?


I guess it is if 1-3 servers are not enough (or if you prefer two frontend machines instead of three for the same load, heh). And with VBull, I've seen a database choke under the load of one webserver...

My current setup indicates that phpBB2 could scale up to 10 or more frontends, since most pages are delivered with 0 or 1 database operations on the backend.

Anyway, of course I will share the modifications once I'm sure they work (and probably after I've found a nice way to reduce the changes needed to the rest of the application... I found it was a little more work than I described).

Thanks for all your help (esp. to TheFinn who gave me something to modify, ie. mysql.php). I'll use this thread to keep you posted when I "go live" with a small (three, four machines) setup.

Cheers,
Marc
Locked

Return to “2.0.x Support Forum”