So this is a short tutorial on "How I Learned To Stop Worrying and Love the Googlebot" for anyone else that is suffering the same travails as I was until a few fortnights ago.
Table of Contents
- - Step #1: Google Webmaster Tools
- Step #2: Robots.txt
- Step #3: Good phpBB Mods For Search Bots
- Step #4: Tweaking the MYSQL Database (ONLY DO IF YOU KNOW WHAT YOU ARE DOING)
- Other Tips
Step #1: Google Webmaster Tools
As far as I can tell, Google is the only search engine that has an extremely helpful area for webmasters called Google Webmaster Tools. If you have not done so already, register for this service by going to their Webmaster Central at: http://www.google.com/webmasters/.
Step #2: Robots.txt
Create a robots.txt file and upload it into your site's root directory if you have not done so already. A good tutorial on how to do so exists here: http://www.askapache.com/seo/seo-with-robotstxt.html.
For the googlebot, this is what I have:
- User-agent: Googlebot
Disallow: /posting.php
Disallow: /admin
Disallow: /privmsg.php
Disallow: /search.php
Disallow: /login.php
Disallow: /memberlist.php
Disallow: /images
Disallow: /includes
Disallow: /profile.php
Depending on what mods you have, you may want to add others. For instance, I have FlashChat by tufat installed on my forum, so I also have a "Disallow: /flashchat" as well. Also, if phpBB2 is not installed in your site's root directory, then you should amend it to read "/{phpBB2 directory here}/admin", obviously replacing the brackets area.
With the above in your robots.txt, Google will stop bothering with areas of the site you don't really want indexed and concentrate on the actual content (i.e., the threads) itself. Once everything is done, you can check Google's Webmaster Tools for an analysis of your robots.txt file. You should test it against specific forum urls to make sure what you want blocked is blocked, and what you want allowed is allowed.
As a last resort, you may want to consider blocking Google entirely. Just get rid of all the Disallows above and add "Disallow: /".
Step #3: Good phpBB Mods for Search Bots
In most cases, the robots.txt file will solve your problems. If it doesn't, some of the mods I found on phpbb.com will help, as will tweaking the database, which is explained in Step #4. In addition, these mods will also enable Google to more effectively index your site for better search results.
Cyberalien's Guest Sessions Mod: Removes the "sid:gh247th2hthh..." part for guests. In my experience, people are wrong when they say Google ignores the sid part, making this mod very important to have.
viewtopic.php?f=16&t=185839&hilit=simple+rewrite&start=0
phpBB-SEO URL Rewriting Mod: Search engines like .html more than they like "viewtopic.php?p=42." A lot more. For my forum, I'm using the Simple Rewriting Method, mainly because I think "topic42.html" looks better than something like "topic42-The-Answer-To-Life-The-Universe-And-Everything.html."
http://www.phpbb-seo.com/boards/phpbb-mod-rewrite-vf33/
Search Bot Indexing Mod: Although not approved by phpBB yet, I've found it to be extremely helpful as it clearly shows what bots are indexing the site, and how many pages they've visited. I always love it when I see some crazy number for the googlebot, but only two page requests from MSN. You will have to make some changes when managing the bots from the admin panel. For instance, "Ask Jeeves" is now just "Ask," and you may want to add "Cuill" as well (ip: 64.1.215.162), which is apparently an up and coming search engine that is going bonkers as of the time of this post.
viewtopic.php?f=16&t=473524&hilit=googlebot&start=0
phpBB-SEO Sitemaps Mod: After everything above is done, submit a sitemap to Google (and other lesser search engines). This mod will set up an excellent sitemap for you, although be sure you do turn on styling for Google (else you'll get an error from Google like I did). You may also want to change the "Default Priority" as the mod's default is 1.0, which I think is too high if the googlebot is giving you problems (I changed it to 0.5). After setting everything up, go to Google Webmaster Tools and submit the sitemap.
viewtopic.php?f=16&t=371752&hilit=sitemaps&start=0
Step #4: Tweaking the MYSQL Database
Be sure to check out the "Other Tips" section before doing this step.
WARNING: Do not attempt this step until you have tried EVERYTHING above! If after all that, the googlebot is still giving you problems, then you might even want to consider upgrading to a higher hosting account before touching the SQL Database if you do not know what you are doing.
SECOND WARNING: Before doing anything, be sure to contact your hosting provider with your problem first! They may be able to fix it for you.
After everything above, the googlebot was still giving me some problems, although it wasn't crashing my site anymore. So I started tweaking the MYSQL system variables to get more performance out of it. To do this, you need to edit your my.cnf file, which can be found in your server's root/etc directory. You can only edit it if you have ssh access; otherwise, contact your hosting provider.
So first, here's a good site I found for editing the my.cnf file: http://www.linuxweblog.com/node/231
NOTE: Before editing your my.cnf file, make sure you make a backup of it!
The above link is for a 2GHz machine with 1GB of memory, so you may need to adjust the variables accordingly. Also, be sure to use the styling in your server's my.cnf file. For instance, instead of saying "query_cache_limit=1M," you may need to say "set-variable = query_cache_limit=1M." The critical ones I found here are "query_cache_type," "query_cache_limit," "query_cache_size," "key_buffer," "sort_buffer," "write_buffer," and "read buffer." You may also want to add "innod_buffer_pool_size" to that list.
If you're using a dedicated server, the above is easy to configure. But if you're like me and on a VPS server, then it's a little more difficult. What I ended up doing is first chmodding the my.cnf to 644 (chmod 644 my.cnf), then copied that to a folder I had ftp access to (cp my.cnf var/www/vhosts/domain.com/public_html). Edited it, copied it back, then chmodded it back to 550.
After you're done, you will need to restart your mysql service (service mysqld restart).
AGAIN, DO NOT DO STEP #4 IF YOU ARE AT ALL UNCOMFORTABLE WITH SQL.
Other Tips:
Try truncating your sessions table every now and then. I do this fairly frequently to reduce the memory load, but truncating the entire table can become a headache for users as it forces them to log-in again. So instead do "DELETE FROM phpbb_sessions WHERE session_user_id = -1;". That will delete only anonymous sessions.
Optimize your database!
Reduce the number of SQL queries! As a rule of thumb, it's good to have 15 per page at most. Keep it at that level, and not only will Google love you, but your site will run faster as well. To see how many queries you're running, download Smartor's page generation mod.
Last but not least, always remember that Google is your friend! Since Google started heavily indexing my site, the number of new users per day on my forum went from 5 to 20. But keep in mind that all good things do have their down side. In my case, I've found that a zealous Google is far better than an indifferent Google.