xshredx and TheFinn, thanks. Very good and insightful stuff in those postings...
re: splitting functionality: yes, that's exactly what I'm trying to do. I have several dozen hosted sites on a handful of servers, all of them using different software. I'm trying to "move" the forums of those site from those hosting servers to dedicated forum servers. The reason: it's virtually impossible to scale any site efficiently without touching the code the site is running on and I don't have the resources to optimize each site individually.
For most sites, the forum is the piece of the pie that creates by far most of the load. Once the biggest forums have been moved to a forum cluster, I will not have any load problems on the hosting servers anymore (and, if I do it right, all load problems of the forum cluster can be solved by throwing another frontend machine into the cluster).
xshredx wrote:
mysql supports best one-way replication... so you can make a cluster of database servers. all updates go to one master server, which gives all the updates to the slave servers.
Yup, and if those slave servers are running on the same machines as the frontends, most of the database connections can be handled locally (via sockets), which saves a lot of resources compared to networked database connections.
xshredx wrote:
so in the case of one-way replication, the bottle-neck is the speed with which the master database server can handle the inserts and the update queries...
Correct. As I said, I'm already running a couple of applications with this setup (e.g. singles.freenet.de), and from that experience, I estimate that one backend will be able to power at least half a dozen frontends (apache + MySQL) before hitting the backend bottleneck (and I'm trying to be conservative here).
xshredx wrote:
even than, when your master server would be down, you still could have a site, but read-only than...
Theoretically, yes. In the real world, the application would have to handle the missing master backend gracefully.
OTOH, redundancy of the backend is not an issue, as downtimes of up to 2hrs are okay and within 2hrs, I can easily have a new machine installed or modify an existing frontend machine to be the new backend.
xshredx wrote:
load-balancing over the slaves will be something that's most up to you, since no real support exists for that with php... maybe you could do this with hand-set rules where you choose which percentage of the select queries goes to which server...
Load balancing for the slaves happens automagically as a result of the load balancing for HTTP connections (and it doesn't matter whether that's done with a DNS RR or with some load balancer hardware). Since the HTTP requests are shared between all frontends and each frontend database only handles the connections from the locally running apache processes, the database (read) load distributes pretty evenly.
In case you wonder whether it's smart to mix apache and MySQL on one server when aiming for high performance: I did not believe it before actually testing that setup, but apache and MySQL seem to develop synergetic effects when running on the same machine. Okay, that's an euphemism for this oversimplification: apache/PHP demands network and CPU, while MySQL is hungry for memory and I/O.
anyway, i say it again... sounds like a nice job to work on sites that get so much traffic... send us the address when it's up, will you??
I will.
I would be glad to hear your opinion on specific points:
- What version to work on? I think I should go for phpBB2, since I'm not in a hurry and my impression is that there has been quite some work on the database code between phpBB and phpBB2.
- Where should I put my modifications? It seems my setup is so exotic that there are few chances of my changes and additions going to the main source tree. So I should probably limit all my changes to a new database driver, even if that means "dirty" or suboptimal code - otherwise, it would be too expensive to maintain the changes over new versions.
- TheFinn, you mentioned that database drivers for other SQL databases make use of functionality that is beyond MySQL. Is there any experience with those drivers in high-load situations? Are there performance reasons to evaluate other databases? (I mean, we're not talking about money or nuclear reactors here, so there's no real need for transactions beyond avoiding corruption of the database, right?)
Again, thanks for your input, all of it is appreciated (except for that "big commercial database backend" idea, which kind of defeats my point of showing that Open Source rules *g*).
I guess I'll try to get a proof-of-concept installation running within the next couple of weeks. It's pretty easy to setup with the existing code, since I can start by using the master backend for all database operations and then move reads to localhosts as I see fit.
One last remark: xshredx, you mentioned /. before... I guess my real goal is to build a forum that cannot be slashdotted. *g*
Cheers,
Marc