Page 1 of 74

[INFO] How gogle PHPBB!

Posted: Fri Aug 02, 2002 10:01 pm
by Showscout
Hi Guys!

.. http://board.protecus.de - and tried a lot with searchengines... so: Why doesn't Google spider my Forum? I heard this question sometimes about the PHPBB!

The Answer is easy:
*censor*

other Suggestions?

Cya S. :D

by thw way

Posted: Fri Aug 02, 2002 10:10 pm
by Showscout
GOOGLE wrote: It will in certain instances. What criteria is used to determine if a dynamic page is indexable is debatable. Most have found that clean, high ranking (High page rank) sites can get dynamic content indexed.
Problems:
Sites that use session tracking urls to give each visitor a dynamic url. These sites can generate an infinite amount of pages for a spider to visit. These types of pages, are usually blocked from being indexed by Google.

If a sites dynamic urls change between spidering. If possible, treat your dynamic urls as though they were static and never changing.


btw: Why does PHPBB uses SID for Guests (like Google?)

Posted: Mon Aug 05, 2002 11:25 am
by R. U. Serious
From what I have seen so far with my sites, this would support your theory. I believe you are probably very right on this!

It is easy to adjust the append_sid() function to check the useragent (google fortunately is using one).

If no one does this before me, I'll write and post it when I get home (should be half a day :? ).


Good point, Showscout! Thanks!

edit: I just saw you edited your original message, and just read wht google says. Well, that makes it pretty clear. *G*
Look here if you can't wait: http://www.php.net/manual/en/reserved.v ... les.server , search for $HTTP_USER_AGENT
The function I would modify, would be appen_sid which is, I believe, in includes/functions.php

Posted: Mon Aug 05, 2002 9:50 pm
by R. U. Serious
Here you go.

Code: Select all

############################################################## 
## MOD Title: enhance-google-indexing
## MOD Author: Showscout & R. U. Serious
## MOD Description: If the User_agent includes the string 'Googlebot', then no session_ids are appended to links, which will (hopefully) allow google to index more than just your index-site. 
## MOD Version: 0.9.1 
## 
## Installation Level: easy
## Installation Time: 2 Minutes 
## Files To Edit: includes/sessions.php 
## Included Files: n/a
############################################################## 
## For Security Purposes, Please Check: http://www.phpbb.com/mods/downloads/ for the 
## latest version of this MOD. Downloading this MOD from other sites could cause malicious code 
## to enter into your phpBB Forum. As such, phpBB will not offer support for MOD's not offered 
## in our MOD-Database, located at: http://www.phpbb.com/mods/downloads/ 
############################################################## 
## Author Notes: There may be issues with register globals on newer 
##       PHP version. If you know for sure and also how to fix it post in
##       this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=32328
##
##       Obviously, if someone thinks it's funny to surf around with a
##       user_agent containing Googlebot and at the same time does not
##       allow cookies, he will loose his session/login on every pageview.
##       Should he complain to you, tell him to eat your shorts.
##
##       If you want to add further crawlers look at the appropiate line and 
##       feel free to add part of the user_agent which should be _unique_
##       unique to that, so a user is never confused with a bot.
## 
############################################################## 
## Version History: 0.9.0 initial release, only googlebot
##                         0.9.1 added inktomi (MSN-search/crawler-bot)
############################################################## 
## Before Adding This MOD To Your Forum, You Should Back Up All Files Related To This MOD 
############################################################## 

#-----[ OPEN  ]------------------------------------------ 
includes/sessions.php

#-----[ FIND ]------------------------------------------ 
	global $SID;

	if ( !empty($SID) && !eregi('sid=', $url) )

#-----[ REPLACE WITH ]------------------------------------------ 
	global $SID, $HTTP_SERVER_VARS;

	if ( !empty($SID) && !eregi('sid=', $url) && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

# 
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------ 
# 
# EoM 
You can test this, if you change the 'googlebot' to 'Mozilla' and disallow cookies on your board (this will actually work for mozilla, IE, opera and lot more browsers). Don't forget to set it back after testing though, or you will get in trouble... *G*

Posted: Mon Aug 05, 2002 9:58 pm
by TC
nice work!

Posted: Mon Aug 05, 2002 10:24 pm
by netclectic
nice one, thanx!

Posted: Wed Aug 07, 2002 5:46 pm
by lars_msh
This looks really neat, and I appreciate the work people are doing on this... but before I dive in I do have one worry - how safe is it? Will Google chuck sites out of their index because of this?

http://www.google.com/webmasters/2.html

Last paragraph:
...setting up pages/links with the sole purpose of fooling search engines may result in permanent removal from our index.


Are we fooling Google, or are we helping? I think it's a genuine case of the latter, but I'm just wondering if Google will send a request with a fake Mozilla agent, another request with a Google user agent, compare the two, see different links and trash the listing! ;-)

Or am I just too paranoid? If nobody's e-mailed Google yet maybe I will... not sure how quickly they'll reply though.

Posted: Wed Aug 07, 2002 7:22 pm
by R. U. Serious
Hi lars,

what google is referring to is sth. different: They rank sites after (besides a lot of other stuff like pigeon rank :lol:) their number of referrals. So if many pages link to you, they know you are popular and you get a higher rank than other pages which appear to have the same content. Now, the problem they try to prevent is that people will set up dozens of fake pages whose only real purpose is to link back to your original site, so that that site will apear to be popular, although actually it is not!

Besides even if they were to compare the two pages (with different agents) they will get the same content, and links pointing to the same place. So there really isn't anything to worry about. Showcraft pointed out why google does not like session_ids, and what we are doing is neither harming nor fooling.

Posted: Wed Aug 07, 2002 9:07 pm
by lars_msh
OK thanks, I'll go along with that... and save the paranoia for another day! :D

Posted: Wed Aug 07, 2002 9:17 pm
by cdkrg
I'd been looking for a solution like that on this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=31269

Thanks.

Posted: Wed Aug 07, 2002 10:25 pm
by R. U. Serious
cdk wrote: I'd been looking for a solution like that on this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=31269

Thanks.


Well, erm, sorry to say that, but actually you were not :lol: ;)

The only thing this "mod" does is leave out the session_id, because google only dislikes that part of the url. What you are asking for is more complex and has partly (for the most part, I believe) been covered in other topics. You/the artivle suggested to rewrite all parameters to make it look like it was a normal url, but this mod leaves all parameters untouched, except for the session_id which it simply ommits.
So, although for google this actually works fine, it doesn't for all other search engines. Keep searching though, there are a couple threads on your topic, as it keeps popping up every other week.

Posted: Wed Aug 07, 2002 10:44 pm
by cdkrg
Hmm, it might be possible to make this mod simply strip the session id for all visitors and you could set it up so that you have a duplicate forum that is linked to invisibly, this way only spiders would go to the forum that indiscriminately strips the session ids.

What do you think?

Posted: Thu Aug 08, 2002 7:57 am
by R. U. Serious
1. How are you going to "hide" the links? -> You really can't, only make them less obvious.
2a. Do you want to duplicate each link? -> confusion & pagesize
2b. Only duplicate one link on the index? -> every spidered page will be on level deeper and possibly have a worse rank. Still confusion if normal users entered that part...
3. People coming from searchengines will automatically land on the sid-less forum. What do you do with them?
4. How do you want to pass the information between pageviews at "which forum" somebody is looking? -> Add yet another get-variable to the URL.

Although I am not saying it can't be done, that is not my choice/way to do it.
If I get spidered, I'll just add the user-agent to the my sessions.php and be done with it. A lot easier and hardly any confusion for users.:)

Posted: Thu Aug 08, 2002 12:34 pm
by BartVB
Thanks! Have been playing with Google for the last 2 days but this can explain the problems that I saw :D Implemented this hack, going to wait and see what is going to happen next :D

(and wondering what effect this will have on my traffic with a 750k posts)

Posted: Thu Aug 08, 2002 4:28 pm
by Showscout
Great, that we finally fixed this Google Prob!

thx to R. U. Serious