[ABD] Google sitemap mod

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment! No new topics are allowed in this forum.
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

IMPORTANT: MOD Development Forum rules

On February 1, 2009 this forum will be set to read only as part of retiring of phpBB2.
User avatar
beggers
Registered User
Posts: 1257
Joined: Fri Nov 23, 2001 8:19 pm
Location: Las Vegas
Contact:

Post by beggers »

A couple of questions:

Would you submit a sitemap if you had 30,000-50,000 posts?

Under normal circumstances, Google seems to be ignoring my forum posts. If I submit a sitemap, will this essentially force Google to spider more pages?

Thanks.
Uchiha Nick
Registered User
Posts: 424
Joined: Wed Jul 14, 2004 12:13 pm
Contact:

Post by Uchiha Nick »

can someone give me a clear explantion of how this thing works? ( like how to submit the sitemap for example )
User avatar
smithy_dll
Former Team Member
Posts: 7632
Joined: Tue Jan 08, 2002 6:27 am
Location: Australia
Name: Lachlan Smith
Contact:

Post by smithy_dll »

Theres an error in your MOD Template near the parse error:


Validating: \sitemap_mod\sitemap_mod.txt
mod validator wrote: MOD Template usage

File to OPEN does not exist in phpBB standard installation package, starting line: 100
#
#-----[ OPEN ]------------------------------------------
#

"SITEMAP_SORT_DESC" => $sitemap_sort_desc,
"SITEMAP_SORT_ASC" => $sitemap_sort_asc,
"TOPIC_LIMIT" => $new['sitemap_topic_limit'],
"ANNOUNCE_PRIORITY" => $new['sitemap_announce_priority'],
"STICKY_PRIORITY" => $new['sitemap_sticky_priority'],
"DEFAULT_PRIORITY" => $new['sitemap_default_priority'],
language/lang_english/lang_admin.php




MOD HTML usage

No problems were detected in this MODs use HTML elements in accordance with the phpBB2 coding standards.


MOD DBAL usage

No problems were detected in this MODs use of databases (if used) in accordance with the phpBB2 coding standards.


Overall

The MOD failed the MOD pre-validation process. Please review and fix your errors before submitting to the MOD DB.


Validating: \sitemap_mod\sitemap.php
php validator wrote: No forbidden functions found in this php file


Validating: \sitemap_mod\sitemap_mod.txt
php validator wrote: No forbidden functions found in this php file
Systems Engineering
jhaskins
Registered User
Posts: 32
Joined: Wed Apr 17, 2002 10:20 pm
Contact:

Post by jhaskins »

The configuration is in a section called "Sitemap Settings" on the General Admin -> Configuration page.

If you want to use the auto submit, you would put the code in a separate .php file & run it. I recommend manually submitting it at https://www.google.com/webmasters/sitemaps/stats, since that allows you to track it's status. Once you've submitted your sitemap, thee's no need to resubmit. Google will redownload it automatically from timt to time.

Regarding the number of posts: there is a theoretical limit of 49,999 posts due to the 50,000 url limit on individual sitemaps (this mod also includes an entry the index page). However, there's also the matter of the 10mb uncompressed size, which could force a lower limit. Then, there's the matter of if your server can handle it.

Although I designed it to keep resource usage as low as possible, I've got no idea how it would handle something that large. If someone with a large board & a dedicated server want's to test it out, then feel free to do so & let us know what happens (If anything bad happens, I wouldn't want anyone to get in trouble for trying it on a shared server).

In the next version, I'll add a way to limit the # of posts included.

Regarding whether or not Google will index all your posts:
From the Sitemap FAQ wrote: 3. Will Google crawl and index all of the URLs in my Sitemap?

We don't guarantee that we'll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site's structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it.


smithy_dll - I download the file of my site & doduble checked it. The only thing I could find wrong with it was a missing "#" above

Code: Select all

#-----[ OPEN ]------------------------------------------
#
language/lang_english/lang_admin.php
All 3 OPEN statements specify valid files ("admin/admin_board.php", "language/lang_english/lang_admin.php", and "templates/subSilver/admin/board_config_body.tpl".

Am I missing something here? :?
mizt
Registered User
Posts: 10
Joined: Sat Jun 11, 2005 7:17 pm

Post by mizt »

Umm, great mod for us seo people. But one thing, the people likely to use this mod have modrewrite installed for shorter more friendly search engine urls. Suppose you could somehow allow this to work with these types of URLS?
mizt
Registered User
Posts: 10
Joined: Sat Jun 11, 2005 7:17 pm

Post by mizt »

Another thing cold you make it 2 different files? I don't think goolge is going to want to wait while my site queries 25,000 posts.
jhaskins
Registered User
Posts: 32
Joined: Wed Apr 17, 2002 10:20 pm
Contact:

Post by jhaskins »

For the next version, I'll be looking into ways to limit the # of topics, split it up into multiple sitemaps, and create appropriate index files. In the mean time, as long as the files is < 10mb uncompressed, google will download it.

Regarding mod_rewrite - this is designed to work with a stock phpBB, but I you provide a sample topic url, I'll be more than happy to help adapt it to your needs. In some future version, I may see what can be done about making the mod work with nonstandard urls out of the box.
ambo
Registered User
Posts: 14
Joined: Wed Jun 08, 2005 10:56 am
Location: South Africa
Contact:

Post by ambo »

I just installed this MOD (V 0.4.0) on my forum and as a point of interest i took a look at the XML it was producing.

It appears that the <lastmod> is being created incorrectly. My XML file has date-times that are NOW and not the last time that topics were modified... is this a known problem - how can i fix it?

I took a quick look over the php code and theres nothing glaringly obvious...
mizt
Registered User
Posts: 10
Joined: Sat Jun 11, 2005 7:17 pm

Post by mizt »

My .htaccess file is as follows.

Code: Select all

Options +FollowSymlinks

RewriteEngine On
#this may cause isues with subdirs and so I have not enabled it.
#RewriteBase /

RewriteRule [.]*-vf([0-9]*) viewforum.php?%{QUERY_STRING}&f=$1
RewriteRule [.]*-vp([0-9]*) viewtopic.php?%{QUERY_STRING}&p=$1
RewriteRule [.]*-vt([0-9]*) viewtopic.php?%{QUERY_STRING}&t=$1 
My urls look like
http://carcommons.com/general-car-maintenance-vf61.html for category
and
http://carcommons.com/more-biker-happy- ... t8684.html for post
jhaskins
Registered User
Posts: 32
Joined: Wed Apr 17, 2002 10:20 pm
Contact:

Post by jhaskins »

ambo - that's something that I messed up when I changed the way dates were done in 0.3.0 (missing the second parameter for gmdate()). I noticed & fixed it when I was cleaning up the code for 0.5.0. To fix it, either upgrade to 0.5.0 or replace the

Code: Select all

'TOPIC_TIME' =>
line with

Code: Select all

TOPIC_TIME' => gmdate('Y-m-d\TH:i:s'.'+00:00', $topic['post_time']),
(make sure you replace the whole line).

mizt - I'll play around with it some & see if I can have something ready later today or tomorrow. In theory, it shouldn't be to difficult.
mizt
Registered User
Posts: 10
Joined: Sat Jun 11, 2005 7:17 pm

Post by mizt »

That'd be great! :)
jhaskins
Registered User
Posts: 32
Joined: Wed Apr 17, 2002 10:20 pm
Contact:

Post by jhaskins »

Ok, this wasn't too hard. Copy http://www.streetrod3.com/sitemap-seo.phps into a .php file, upload it & give it a shot.
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

hello there

Post by dcz »

I am very intersted in this mod, but as some people writing here, I use Url rewritting from this post

urls in my forum are like this

http://www.marsatak.org/marsforum/forum13.php for the forums

and like this

http://www.marsatak.org/marsforum/ftopic37.php for the posts (with variation for topics containing many pages (I put away the "asc" in those type of url created by the original mod I just told about).

All other urls are forbiden by the robots.txt in order to avoid too many duplicate contents.

So if anyone arround is willing to give me an hand on this one, this would be very nice of him ;) as I'd love to have this beuatiful sitemap of yours.

++

dcz

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
jhaskins
Registered User
Posts: 32
Joined: Wed Apr 17, 2002 10:20 pm
Contact:

Post by jhaskins »

Try replacing

Code: Select all

'TOPIC_URL' => $server_url."viewtopic.$phpEx?t=" . $topic['topic_id'],
with

Code: Select all

'TOPIC_URL' => $server_url."ftopic" . $topic['topic_id'] . ".php",
User avatar
smithy_dll
Former Team Member
Posts: 7632
Joined: Tue Jan 08, 2002 6:27 am
Location: Australia
Name: Lachlan Smith
Contact:

Post by smithy_dll »

jhaskins wrote: smithy_dll - I download the file of my site & doduble checked it. The only thing I could find wrong with it was a missing "#" above

Code: Select all

#-----[ OPEN ]------------------------------------------
#
language/lang_english/lang_admin.php
All 3 OPEN statements specify valid files ("admin/admin_board.php", "language/lang_english/lang_admin.php", and "templates/subSilver/admin/board_config_body.tpl".

Am I missing something here? :?


It gives you an approximate line. The missing # can confuse some parsers. ;)
Systems Engineering
Locked

Return to “[2.0.x] MODs in Development”