[RC4] GYM Sitemaps & RSS (aka mx Google Sitemaps)

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment! No new topics are allowed in this forum.
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

IMPORTANT: MOD Development Forum rules

On February 1, 2009 this forum will be set to read only as part of retiring of phpBB2.
Post Reply
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

[RC4] GYM Sitemaps & RSS (aka mx Google Sitemaps)

Post by dcz »

MOD Title: Google Yahoo MSN Sitemaps and RSS (aka GYM sitemaps, previously known as mx sitemaps)
Author : dcz / http://www.phpBB-SEO.com/
MOD Description: This module will create United Sitemaps (Aka Google sitemaps), RSS 2.0 feeds and a nice Yahoo urllist.txt for phpBB and mxBB Portal (In case it's installed).
Uses a full support Gun-zip cache for all types of outputs and some XSL-Transform styling for Sitemaps and RSS feeds.

MOD Version: 1.2.0RC4
Installation Level : Easy.
Installation Time : 2 min.
Compatible EasyMOD : n/a.
phpBB : 2.0.22
mxBB : 2.7.x & 2.8.x

MOD Download: http://www.phpbb-seo.com/downloads/gym_ ... 2-0RC4.zip
Alternate MOD Download: Google Yahoo MSN sitemaps and RSS V1.2.0RC4

Demo Board: Author's Notes : From the Gym sitemaps release thread
  • Modular Google Sitemaps (United Sitemaps, RSS 2.0 Feeds and Yahoo urllist.txt solution for phpBB and mxBB Portal.

    This module will generate several types of outputs :
    • United Sitemaps (Google Sitemaps) :
      This mod will create a SitemapIndex, listing all the created Sitemaps.
      They will be usable by the Google, Yahoo and MSN search engines.
      Sitemaps are 100% valid, sitemaps 0.9 protocol.
      They can be styled using XSL-Transform style-sheets (configurable ACP).

      WARNING
      • Please make sure the Sitemaps are working before you submit the sitemap.php file (or sitemaps.xml with mod Rewrite) to Google Sitemaps and Yahoo.
        MSN submission is not handled the same way yet, but you can still notify the MSN Search engine using the sitemaps.org submit procedure.
      Sitemaps Created :
      • :arrow: One SitemapIndex listing all sitemaps
        :arrow: One general forum sitemap, listing public forums url
        :arrow: One sitemap per public forum listing topics.
      When mxBB installed :
      • :arrow: Adds an entry in the sitemapIndex
        :arrow: One sitemap listing all public mx pages.
      Note :
      • Knowledge base (KB) mod is until now not supported by the module. Everything is ready to have it plugged in with ACP options, as soon as a stable KB version will be released.
    • RSS 2.0 Feeds:
      You can as well register you main Feeds at Yahoo, like rss.php (or rss.xml).
      Forums RSS Feeds can notify yahoo automatically using the Yahoo! Notifications API.
      You'd have to apply for a Yahoo AppiD.
      Please note that Yahoo will not allow the use of "&" in submited RSS feeds URLs. This mean you won't be able to submit feeds with "&" in their URL.
      This only concern few feeds, and only if you do not use URL rewriting.

      phpBB authorisations:
      • The module is able to build personalized rss feeds according to the user's authorisations.
        If set to yes, users will be able to browse private forum feeds if they have enough permission to do so.
      RSS Feeds Created :
      • :arrow: One Feed listing the last items from all available sources ( plug-in included )
        :arrow: One Feed listing the last messages from all the forum
        :arrow: One Feed per forum
        :arrow: One special Feed, listing all available feeds.
      Each feed is available in three configurable versions :
      • :arrow: A long one,
        :arrow: A default one,
        :arrow: A short one.
        :arrow: All of these thres options can be combined with a extra one, to additionally add the messages content in the feeds.
      Advanced content filtering :
      • It is possible to filter messages content in feeds :
        • Links :
          You may choose here to either activate or not links used in posts.
          If deactivated, links will be outputted as part of the content but won't be clickable.
        • BBcodes :
          You can deactivate BBcode parsing or filter them (tags and/or content).
          :arrow: The format is simple :
          You'll just have to enter a comma separated list of bbcode tags to exclude (tags removing) or delete (full bbcode bloc removed and replaced by {bbcode} in output) with the additional ":" option.
          :arrow: Example :
          img:1,b:0,quote,code:1
          In this example, img bbcode and the img link will be replaced with {img}, bold won't be processed, but the bold-ed text will be kept, quote won't be parsed, but their content will be kept, code bbcode and their content will be replaced with {code} in the output.
        • Smileys :
          You may choose here to either parse the smileys or not in content.
        • Digest :
          You can select between three different method to limit the outputted messages content in feeds : By sentences, by words and by characters.
          None will break word, select the one you feel more comfortable with.
      Multiple Char-set handling :
      • The module uses UTF-8 as output char-set.
        phpBB3 conversion method is integrated in the module to allow conversion from numerous char-sets.
        You'll thus have to set the char-set used by your forum in the ACP, in case the 'auto' parameter fails (which unfortunately shall occur in several cases).
        You can find out what is your forum's char-set in any of your forum's page source code :
        <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    • Yahoo! urllist.txt :
      The module creates a Yahoo! urllist.txt, listing last active threads from all the forums, one URL per line.
    Common features for all the module :
    • Cache :
      A complete cache system configurable from ACP.
      All the maps (sitemaps, rss and urllist.txt) are entirely saved in a folder.
      When a cached file is up to date and available, the module will just send it as is to the browser without further processing, making the output very, very fast, comparable to a direct physical file access.
      Each type of cache, corresponding to each type of listings, can be configured separately. You can decide for each one how the cache will be updated : auto (each time the cache expires) or manual (files delete from ACP).

      Benchmark : (11834 URLs)
      • Without Gunz-zip compression :
        First page load : Cache is being build
        <!-- URL list generated in 5.41892 s - 25 sql - 11834 URLs listed -->
        <!-- Output started from cache after 5.42756 s - sql -->
        <!-- Output from cache ended up after 6.93087 s - sql -->

        This means that the module is building a 11834 URLs list in 5.41892s, and that the cache file is saved in 0.00864s (2 119 631 octets)(the file being sent to the browser right after saving).
        The output ended 6.93087s after it was requested.

        Second load :
        <!-- URL list generated in 5.41892 s - 25 sql - 11834 URLs listed -->
        <!-- Output started from cache after 0.00256 s - sql -->
        <!-- Output from cache ended up after 1.57475 s - sql -->

        The first line being itself cached, to recall how hard it was to build up such a long list before sending it.
        This is to be compared to the 0.00256 s here needed to start the output :D
        Then, file transfer is relatively long, but the file is 2mo and if we take into account the large number of URLs it's pretty fast.
      • With Gunz-zip :
        Unfortunately, there are no stats available for this output, the function used to read and send a gunzipped file makes it impossible it seems.
        But once activated, our 2mo becomes 48 ko !!
        And as the file is sent to the browser as is, it's only a matter to sen a 48ko file to list 11834 URLs :D
        So no need to search too long to figure out it is even a lot more faster, with a lot fewer server resources spent.
        A work comparable to sending a 48ko image to list only 11834 URLs.
    • SQL Cycles and perf settings :
      All major queries are separated into several cycles, configurable in acp, for all types of listings.
      Default limit settings should be good for most cases.
      The principle is simple, to list 10 000 URLs we cannot query for 10 000 items at a time, the module will perform several.
      The idea is to limit the number of items per query as well as the number of query themselves.
      You should no go over 30 queries per sitemaps, and if your server is not able to build up a 10 000 URLs list in a reasonable time, no need to go for so much.
      We here would deal with the last 10 000 topic to have received an answer in a given forum, not all forums, so we can conclude topics older than that would be quite old.
      In general, there is no need to go above 5000 URLs per sitemaps.

      Pleas not as well that it would be totally useless to list thousands of URLs in RSS feeds, even if the module can do it.
      RSS feeds are not meant to be used as sitemaps.
    • URL Rewriting :
      You can switch mod rewrite type in ACP for all listings, phpBB SEO mod rewrite 0.2.x are auto detected.
      Settings for other mod rewrite standards (Webmedic's rewrite mod, able2know rewrite mod, GoogleBB Links) can now be done easily, simply editing a file.
    Small limitation :
    • If Gun-zip is activated in phpBB, it will be so in the module, it is though possible to use gun-zip compression on the module if it's turned off in phpBB.
[/list]

Languages : PlugIns :
  • [PlugIn] GYM Sitemaps XML :
    • Will incorporate easily sitemaps (must be valid Google sitemap) to your GYM Sitemaps Google sitemaps listings.
  • [PlugIn] GYM Sitemaps TXT :
    • To incorporate easily url txt lists (one URL per line) to your GYM Sitemaps Google sitemaps.
Last edited by dcz on Tue May 15, 2007 12:18 am, edited 11 times in total.

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

No need to say using this will help google and other search engines to spider pages that are far away from the root level a lot faster and easyer.

It will also help bots to better schedule theire spidering, since the sitemap index shows last mod for all sitemaps.

I recomand though that you do at least something with SID toghether with this, since they are disturbing search bots.

Cyber alien's Guest session mod is perfect for that.

This mod will help indexing, even without mod rewrite, if you took care of sids.

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
rockboyteek
Registered User
Posts: 591
Joined: Tue Mar 29, 2005 2:50 pm
Contact:

Post by rockboyteek »

http://www.adminfuel.com/ - Forum for Forum Administrators
mdvaldosta
Registered User
Posts: 353
Joined: Sat Mar 26, 2005 12:26 am
Contact:

Post by mdvaldosta »

So this only works with your mod_rewrite? Or any?
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

rockboyteek wrote: is this similiar to these?

http://www.phpbb.com/phpBB/viewtopic.ph ... ght=google

http://www.phpbb.com/phpBB/viewtopic.ph ... ght=google

and will it index site easier:?:


Well, it was actually first inspired from the first one, but has become totally different since then.

The second one I don't know, it has nothing to deal with, I'd suggest using cyber alien guest session mod for getting rid of sid as the minimum requirement to start getting spidered.

And yes, this mod wil help indexing quite a lot. Without this, bots need to first go to index (one visit) then to a forum (one more visit) to eventually find out a topic's url (one more visit) and I am not even talking about pagination here.

With the mod, one visit is enough to see which sitemap changed, see all forums changes, see all topics from a forum (and not just some per pages).

So with quite a few visit, google can find out all of your site url (up to 50 000 per sitemap, eg you can list up to 50 000 topics (including paginated links) per forum in just one shot, nothing is comparable in regular indexing.

And after, when google spidered well your sitemap, one brief look up on the sitemap index will make him know which sitemap has changed since last visit, so it won't loose time coming back to visit unchanged content and will use each visit in a really more efficient way. Visiting first sitemapindex, then changed sitemaps, then all changed links (they all have theire specific lastmod in the end) and thus improving a lot the efficiency of each visit.

Really, this is kind of a must, and if it's first google, all majors bots are using it too, they just need some link pointing directly to it.
I recently heard yahoo was allowing sitemap registration too, I did not try it so far, but there is no reason the google sitemap standard, since it's w3c, would not suite yahoo's.

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

mdvaldosta wrote: So this only works with your mod_rewrite? Or any?


Nope it works for vanilla phpbb forums too.

Providing contrib files for rewrited forums is just something added for convenience here, just like the mx and kb add on.

Note that in all cases, only the actually usable code by your site will be used.
Each add on module will take care of itself, so even if let's say you upload mx and kb files in a vanilla board, those won't en causing any pb (but i recommend for coherence to only upload actually used file as explained in the install txt).

Then, talking about the mod rewrite on sitemap url themselves, it's also working on vanilla boards, and to be honest is more kind of a cosmetic feature, since Google will accept both url standard for sitemaps. I just find it cooler to get the xml extension on those ;)

++
Last edited by dcz on Sun Jun 04, 2006 11:59 pm, edited 2 times in total.

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
mdvaldosta
Registered User
Posts: 353
Joined: Sat Mar 26, 2005 12:26 am
Contact:

Post by mdvaldosta »

I wasn't clear, I was referring to using webmedic's or a2k's mod_rewrite with your sitemap mod. I use webmedic's, but since I cannot find a download location for his sitemap mod I'm looking for an alternative.
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

mdvaldosta wrote: I wasn't clear, I was referring to using webmedic's or a2k's mod_rewrite with your sitemap mod. I use webmedic's, but since I cannot find a download location for his sitemap mod I'm looking for an alternative.


Then you came to the right place ;)
Actually, my mod rewrite is taking webmedic's url standard for convenience and because I started to code mod rewrite with webmedic's mod.

The thing is after 8 month without news from him, I started updating his code and actually plan to continu developping it, at first fixing and improving, but with the intention to end up not using ob_start anymore, since it's the cause of many problems as far as styling and server load.

So you just need to use the files in contrib folder instead of the standard ones, and you'll be set.

Since I already updated webmedic mod, you'll ever need to put back the entire code of make_url_friendly() in the sitemap.php file (I moved this part in includes/function.php instead in order to avoid repeating it's code in every add on like it was the case in 2.3.0.
You can proceed as I did to, eg mooving make_url_friendly() from page_header.php to includes/function.php, in order to only have to maintain one occurance of this function, but you'll then have to get rid of it in every other file that may use it.

If you are interested, I can provide a DL link to my updated version of webmedic toolkit (2.3.1) which solves one security leak and adds nav links handeling.

I also plan to go 2.4.0 quite soon, as I am rebuilding all the rewriterules for faster and more efficient handeling, with also several add ons to the script. Keep tunned ;)

++

I'll add any other url standard in contrib upon request.

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
rockboyteek
Registered User
Posts: 591
Joined: Tue Mar 29, 2005 2:50 pm
Contact:

Post by rockboyteek »

em,m right so what i have to do?.. just install mod and done.. or do i need to submit sitemap to google?..

also you know for ths SID thing... i have used the able2know search optimization thing.. is that good enough?.. as it seems the one from cyberalien has got problems...
http://www.adminfuel.com/ - Forum for Forum Administrators
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

rockboyteek wrote: em,m right so what i have to do?.. just install mod and done.. or do i need to submit sitemap to google?..

also you know for ths SID thing... i have used the able2know search optimization thing.. is that good enough?.. as it seems the one from cyberalien has got problems...


Well, I don't know much about the able to know mod rewrite, beside showing me an example would be enough to provide an additional contrib file for this case.

Now, when I went to your site : http://www.etcworld.co.uk/etcforum/index.php I could not see any mod rewrite implemented throughout the forum.

So if you are using some mod rewrite that only gets activated for bots, please send me details about those rewrited url (forum, topic and paginated topic url).

Talking about SIDs, whatever method is good as long as it works, I did not have any pb so far with the guest session mod so I still use it.

And if your url are vanillia throughout your board, you just need to install the scritp and follow the provided link in acp to either register to google or anonymously submit your sitemap.

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
rockboyteek
Registered User
Posts: 591
Joined: Tue Mar 29, 2005 2:50 pm
Contact:

Post by rockboyteek »

dcz wrote: Now, when I went to your site : http://www.etcworld.co.uk/etcforum/index.php I could not see any mod rewrite implemented throughout the forum.

So if you are using some mod rewrite that only gets activated for boat, please send me details about those rewrited url (forum, topic and paginated topic url).


confused what you mean by this :-S ... also when i go to sitemap.php it says that No SItemaps... what wrong.. please explaing in easy terms what you mean by mod rewrites....
http://www.adminfuel.com/ - Forum for Forum Administrators
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

rockboyteek wrote:
dcz wrote:

Now, when I went to your site : http://www.etcworld.co.uk/etcforum/index.php I could not see any mod rewrite implemented throughout the forum.

So if you are using some mod rewrite that only gets activated for boat, please send me details about those rewrited url (forum, topic and paginated topic url).


confused what you mean by this :-S ... also when i go to sitemap.php it says that No SItemaps... what wrong.. please explaing in easy terms what you mean by mod rewrites....


Well, then, you should be using vanillia (eg not changed url on your board).
Look at mine : http://www.marsatak.org/marsforum/pc-co ... -vt12.html

On yours it would be something like : -http://www.marsatak.org/marsforum/viewtopic.php?t=12

So this is mod rewrite. Several option can be followwed going this way, static or title injected as I do.

So I assume that if you say this, you are not using any mod rewrite on you board, so the mod should just work if properly installed without any modification.
You can still play with using mod rewrite for sitemaps, which is a totally different thing if you whish.

The error you get is suggesting no usable script where found by sitemap.php.

So there should be some paths issues here.

At first try, sitemap.php should be uploaded in the phpbb folder (you can move it on level up after if you which, but try like this before), and the specified files (kb or not, mx or not) still in phpbb folder in the mx_ggsitemaps/ folder.
Then, sql and admin file are required to be properly installed too.

Tell me more about your dir structure and the uploaded files, but it should work with no effort after all files are uploaded in the right place.

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
rockboyteek
Registered User
Posts: 591
Joined: Tue Mar 29, 2005 2:50 pm
Contact:

Post by rockboyteek »

got it working now.. my bad :D... is it meant to come up with a load of coding?... like lastmod and stuff?... also i have many mods on forum.. and still do not understand wjhat you mean with he pages...

[SPAM]
http://www.adminfuel.com/ - Forum for Forum Administrators
dcz
Registered User
Posts: 787
Joined: Sun Feb 13, 2005 5:37 am
Contact:

Post by dcz »

rockboyteek wrote: got it working now.. my bad :D... is it meant to come up with a load of coding?... like lastmod and stuff?... also i have many mods on forum.. and still do not understand wjhat you mean with he pages...

http://www.etcworld.co.uk/etcforum/sitemap.php


Perfect.

So the link you mention is your sitemap index, the one you have to submit or register.

As you can see, you have one link per public forum (?fid=xx) and one for all forums (?forum).

So if you hit : http://etcworld.co.uk/etcforum/sitemap.php?forum
you'll see a list of all public forums in your site, with the lasmood date based on last answer made in it. This way, bots only have to actually visit forums with new content.
I also implemented there a link to the board itself with last mod based on last post in all forums.

And if you it on of the : http://etcworld.co.uk/etcforum/sitemap.php?fid=2
You'll get the list of all topics in the forum which id is 2. You can see here that pagination is taken care of, lastmod based on last answer time to each thread.

Also notice that result get sorted according to last activity, like in default forum view, so that freshest content always get at the top of the list ;)

If you activate mod rewrite for sitemaps, you'll have url like mine for those :
SitemapIndex : http://www.marsatak.org/sitemaps.xml
Sitemap Forums : http://www.marsatak.org/sitemap-forum.xml
Forum Sitemaps: http://www.marsatak.org/forum-sitemap-13.xml
Added content: http://www.marsatak.org/marsatak-sitemap.xml

But google is ok for both ways as I already said.

Since your board is in a folder and your site is hosting other pages at the root level, you can use my small tutorial to add content in the system. You'll need to follow instructions (two actually) in order to move the sitemap.php file above the forum folder.

Once you are decided about where to put your sitemap.php file, you can submit it. Going google is cool because of accurantes stast it gives, but anonymous is ok too.

You can also see it working smothly for your large forums (id=1 is a good example of how fast is the code ;) )

++

phpBB SEO || phpBB3 SEO Premod || SEO phpBB3
GYM Sitemaps & RSS for phpBB3: GYM Sitemaps & RSS
rockboyteek
Registered User
Posts: 591
Joined: Tue Mar 29, 2005 2:50 pm
Contact:

Post by rockboyteek »

nice mod i must say.. and yes submitted to google.. but not good ranks... need more publicity and members
http://www.adminfuel.com/ - Forum for Forum Administrators
Post Reply

Return to “[2.0.x] MODs in Development”