Dog Cow wrote:And the reason they don't exist is because they're too hard to get, define, and standardize. Everyone's server setup is different. I might be running MySQL and Apache on one box. You might have a cluster of separate machines all on a private network.
i don't know that this argument is productive, but let me at least list what i think.
of course you do not compare one bbs running on one server using one software with another bbs running on a different server with a different software.
but this does not mean that benchmarks are "subjective", it just means they need to be well-defined.
one could use similar objections to the ones you mentioned to argue that different database engines can't be compared, or different CPUs etc.
benchmarks can be created. you just need to define *exactly* what are you going to measure, and then repeat the same measurements for the different systems you want to benchmark: you define server hardware and software, and you use a standardize content, i.e. users, forums, posts etc.
using some converters you port *the same* content to several different systems.
you also create artificial load, by creating "bot users". ask any spammer - it may not be easy, but it's doable.
one by one you install the different BBS systems on the same test setup, and you run a standardize load, i.e. well defined number of scripts doing well defined set of things - reading, writing, searching etc., at exactly the same rate.
running this load you measure how long some standard operations take - opening a forum, browsing, paging, posting, editing, deleting, searching etc.
you also record the load on the host (cpu, memory, DB load, I/O load etc.)
you take pains to do exactly the same thing, with the same content, hardware, software (except the software that is being compared, of course) and load. for each of the BBS system you want to benchmark.
as i just demonstrated, it is very easy to *define*. however, as i mentioned, actually *executing* the benchmark requires hard work, and with little reward.
however, one presumes (or at least hopes) that once this benchmark-framework existed, it would be much easier to repeat the measurements for every new version, and thus detect performance regressions or, to be optimistic, performance improvements achieved from one version to the next.