[INFO] How gogle PHPBB!

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment! No new topics are allowed in this forum.
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

IMPORTANT: MOD Development Forum rules

On February 1, 2009 this forum will be set to read only as part of retiring of phpBB2.
Locked
Showscout
Registered User
Posts: 31
Joined: Thu Jul 18, 2002 4:29 pm

[INFO] How gogle PHPBB!

Post by Showscout » Fri Aug 02, 2002 10:01 pm

Hi Guys!

.. http://board.protecus.de - and tried a lot with searchengines... so: Why doesn't Google spider my Forum? I heard this question sometimes about the PHPBB!

The Answer is easy:
*censor*

other Suggestions?

Cya S. :D
Last edited by Showscout on Thu Feb 03, 2005 10:04 pm, edited 17 times in total.
CU Showscout
Security Forum

Showscout
Registered User
Posts: 31
Joined: Thu Jul 18, 2002 4:29 pm

by thw way

Post by Showscout » Fri Aug 02, 2002 10:10 pm

GOOGLE wrote: It will in certain instances. What criteria is used to determine if a dynamic page is indexable is debatable. Most have found that clean, high ranking (High page rank) sites can get dynamic content indexed.
Problems:
Sites that use session tracking urls to give each visitor a dynamic url. These sites can generate an infinite amount of pages for a spider to visit. These types of pages, are usually blocked from being indexed by Google.

If a sites dynamic urls change between spidering. If possible, treat your dynamic urls as though they were static and never changing.


btw: Why does PHPBB uses SID for Guests (like Google?)
Last edited by Showscout on Thu Aug 08, 2002 4:07 pm, edited 1 time in total.
CU Showscout
Security Forum

R. U. Serious
Registered User
Posts: 830
Joined: Mon Feb 11, 2002 2:07 pm

Post by R. U. Serious » Mon Aug 05, 2002 11:25 am

From what I have seen so far with my sites, this would support your theory. I believe you are probably very right on this!

It is easy to adjust the append_sid() function to check the useragent (google fortunately is using one).

If no one does this before me, I'll write and post it when I get home (should be half a day :? ).


Good point, Showscout! Thanks!

edit: I just saw you edited your original message, and just read wht google says. Well, that makes it pretty clear. *G*
Look here if you can't wait: http://www.php.net/manual/en/reserved.v ... les.server , search for $HTTP_USER_AGENT
The function I would modify, would be appen_sid which is, I believe, in includes/functions.php

R. U. Serious
Registered User
Posts: 830
Joined: Mon Feb 11, 2002 2:07 pm

Post by R. U. Serious » Mon Aug 05, 2002 9:50 pm

Here you go.

Code: Select all

############################################################## 
## MOD Title: enhance-google-indexing
## MOD Author: Showscout & R. U. Serious
## MOD Description: If the User_agent includes the string 'Googlebot', then no session_ids are appended to links, which will (hopefully) allow google to index more than just your index-site. 
## MOD Version: 0.9.1 
## 
## Installation Level: easy
## Installation Time: 2 Minutes 
## Files To Edit: includes/sessions.php 
## Included Files: n/a
############################################################## 
## For Security Purposes, Please Check: http://www.phpbb.com/mods/downloads/ for the 
## latest version of this MOD. Downloading this MOD from other sites could cause malicious code 
## to enter into your phpBB Forum. As such, phpBB will not offer support for MOD's not offered 
## in our MOD-Database, located at: http://www.phpbb.com/mods/downloads/ 
############################################################## 
## Author Notes: There may be issues with register globals on newer 
##       PHP version. If you know for sure and also how to fix it post in
##       this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=32328
##
##       Obviously, if someone thinks it's funny to surf around with a
##       user_agent containing Googlebot and at the same time does not
##       allow cookies, he will loose his session/login on every pageview.
##       Should he complain to you, tell him to eat your shorts.
##
##       If you want to add further crawlers look at the appropiate line and 
##       feel free to add part of the user_agent which should be _unique_
##       unique to that, so a user is never confused with a bot.
## 
############################################################## 
## Version History: 0.9.0 initial release, only googlebot
##                         0.9.1 added inktomi (MSN-search/crawler-bot)
############################################################## 
## Before Adding This MOD To Your Forum, You Should Back Up All Files Related To This MOD 
############################################################## 

#-----[ OPEN  ]------------------------------------------ 
includes/sessions.php

#-----[ FIND ]------------------------------------------ 
	global $SID;

	if ( !empty($SID) && !eregi('sid=', $url) )

#-----[ REPLACE WITH ]------------------------------------------ 
	global $SID, $HTTP_SERVER_VARS;

	if ( !empty($SID) && !eregi('sid=', $url) && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

# 
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------ 
# 
# EoM 
You can test this, if you change the 'googlebot' to 'Mozilla' and disallow cookies on your board (this will actually work for mozilla, IE, opera and lot more browsers). Don't forget to set it back after testing though, or you will get in trouble... *G*
Last edited by R. U. Serious on Wed Aug 07, 2002 7:39 pm, edited 1 time in total.

User avatar
TC
Former Team Member
Posts: 3633
Joined: Tue Sep 25, 2001 7:23 pm
Location: Kµlt °ƒ Ø, working on my time machine

Post by TC » Mon Aug 05, 2002 9:58 pm

nice work!
.:: 28:Ø6:42:12 ::.

User avatar
netclectic
Former Team Member
Posts: 4439
Joined: Wed Mar 13, 2002 3:08 pm
Location: Omnipresent
Contact:

Post by netclectic » Mon Aug 05, 2002 10:24 pm

nice one, thanx!
Defend the game:
Image

lars_msh
Registered User
Posts: 36
Joined: Thu May 23, 2002 8:17 pm

Post by lars_msh » Wed Aug 07, 2002 5:46 pm

This looks really neat, and I appreciate the work people are doing on this... but before I dive in I do have one worry - how safe is it? Will Google chuck sites out of their index because of this?

http://www.google.com/webmasters/2.html

Last paragraph:
...setting up pages/links with the sole purpose of fooling search engines may result in permanent removal from our index.


Are we fooling Google, or are we helping? I think it's a genuine case of the latter, but I'm just wondering if Google will send a request with a fake Mozilla agent, another request with a Google user agent, compare the two, see different links and trash the listing! ;-)

Or am I just too paranoid? If nobody's e-mailed Google yet maybe I will... not sure how quickly they'll reply though.

R. U. Serious
Registered User
Posts: 830
Joined: Mon Feb 11, 2002 2:07 pm

Post by R. U. Serious » Wed Aug 07, 2002 7:22 pm

Hi lars,

what google is referring to is sth. different: They rank sites after (besides a lot of other stuff like pigeon rank :lol:) their number of referrals. So if many pages link to you, they know you are popular and you get a higher rank than other pages which appear to have the same content. Now, the problem they try to prevent is that people will set up dozens of fake pages whose only real purpose is to link back to your original site, so that that site will apear to be popular, although actually it is not!

Besides even if they were to compare the two pages (with different agents) they will get the same content, and links pointing to the same place. So there really isn't anything to worry about. Showcraft pointed out why google does not like session_ids, and what we are doing is neither harming nor fooling.

lars_msh
Registered User
Posts: 36
Joined: Thu May 23, 2002 8:17 pm

Post by lars_msh » Wed Aug 07, 2002 9:07 pm

OK thanks, I'll go along with that... and save the paranoia for another day! :D

User avatar
cdkrg
Registered User
Posts: 706
Joined: Fri Jul 12, 2002 12:35 pm
Contact:

Post by cdkrg » Wed Aug 07, 2002 9:17 pm

I'd been looking for a solution like that on this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=31269

Thanks.

R. U. Serious
Registered User
Posts: 830
Joined: Mon Feb 11, 2002 2:07 pm

Post by R. U. Serious » Wed Aug 07, 2002 10:25 pm

cdk wrote: I'd been looking for a solution like that on this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=31269

Thanks.


Well, erm, sorry to say that, but actually you were not :lol: ;)

The only thing this "mod" does is leave out the session_id, because google only dislikes that part of the url. What you are asking for is more complex and has partly (for the most part, I believe) been covered in other topics. You/the artivle suggested to rewrite all parameters to make it look like it was a normal url, but this mod leaves all parameters untouched, except for the session_id which it simply ommits.
So, although for google this actually works fine, it doesn't for all other search engines. Keep searching though, there are a couple threads on your topic, as it keeps popping up every other week.

User avatar
cdkrg
Registered User
Posts: 706
Joined: Fri Jul 12, 2002 12:35 pm
Contact:

Post by cdkrg » Wed Aug 07, 2002 10:44 pm

Hmm, it might be possible to make this mod simply strip the session id for all visitors and you could set it up so that you have a duplicate forum that is linked to invisibly, this way only spiders would go to the forum that indiscriminately strips the session ids.

What do you think?

R. U. Serious
Registered User
Posts: 830
Joined: Mon Feb 11, 2002 2:07 pm

Post by R. U. Serious » Thu Aug 08, 2002 7:57 am

1. How are you going to "hide" the links? -> You really can't, only make them less obvious.
2a. Do you want to duplicate each link? -> confusion & pagesize
2b. Only duplicate one link on the index? -> every spidered page will be on level deeper and possibly have a worse rank. Still confusion if normal users entered that part...
3. People coming from searchengines will automatically land on the sid-less forum. What do you do with them?
4. How do you want to pass the information between pageviews at "which forum" somebody is looking? -> Add yet another get-variable to the URL.

Although I am not saying it can't be done, that is not my choice/way to do it.
If I get spidered, I'll just add the user-agent to the my sessions.php and be done with it. A lot easier and hardly any confusion for users.:)

User avatar
BartVB
Consultant
Consultant
Posts: 1288
Joined: Thu Aug 02, 2001 1:32 pm
Location: The Netherlands
Contact:

Post by BartVB » Thu Aug 08, 2002 12:34 pm

Thanks! Have been playing with Google for the last 2 days but this can explain the problems that I saw :D Implemented this hack, going to wait and see what is going to happen next :D

(and wondering what effect this will have on my traffic with a 750k posts)
I Hate oversized sigs and Love Penguins :D

Showscout
Registered User
Posts: 31
Joined: Thu Jul 18, 2002 4:29 pm

Post by Showscout » Thu Aug 08, 2002 4:28 pm

Great, that we finally fixed this Google Prob!

thx to R. U. Serious
CU Showscout
Security Forum

Locked

Return to “[2.0.x] MODs in Development”

Who is online

Users browsing this forum: No registered users and 13 guests