phpBB3 SEO Sitemap

Last modified date/time should be optional - phpBB3 SEO Sitemap

Last modified date/time should be optional

by KYPREO » Mon Nov 18, 2019 11:26 am

Thank you for this extension.

I have managed to get it working in 3.2.x.

I think there are 2 things that would improve this extension.

Number 1 is not to rely on the app.php URL rewrite function. 90% of the comments in the discussion section relate to issues in finding sitemap.xml and probably stem from not having a working URL rewrite setup. Changing this behaviour seems to be able to be achieved by modifying a single line in .\acp\sitemap_module.php and then the template file in .\styles\sitemap.xsl

Secondly, under the sitemap protocol, last modified time/date is actually an optional parameter, see: https://www.sitemaps.org/protocol.html

Google has published a few comments which indicate that it may or may not decide to crawl or index pages based on the last modified date indicated in the sitemap. This would apply to many boards that have tens of thousands of topics going back 20 years. Many of the old topics may contain useful relevant information but specifying a last modified date of 2004, for example, could penalise the value of the page in the Google Index.

One possibility is to specify all URLs with a near current last modified time/date, however this could potentially result in a penalty for misleading crawlers which follow the sitemap, especially if the search engine can cross-reference against actual post times. Nevertheless I have seen that many topics on other boards that come up during Google searches seem to have a recent last modified date, even when the content is quite old.

A better option might be to omit last modified date altogether and leave it up to the search engine to figure out whether the content is good or not. I figure it is better not to give the search engine that information.

It possibility to make a few simple modifications to the code for this extension so that the XML it creates does not include a last modified time for each URL.

I edited the following files to commenting out out the relevant date lines:
.\styles\sitemap.xsl (lines 49, 72-74, 88, 103-105)
.\appender.php (lines 106-11 and 165)

This then produced a sitemap without any last modified dates.
phpBB user since 2002
www.AusRotary.com
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Last modified date/time should be optional

by KYPREO » Wed Nov 20, 2019 12:23 am

I wanted to add to the above post that Google has successfully processed and validated my sitemap without the last modified date info. I am interested to see what difference if any this makes to indexing - at the very least it will help my identify which pages are actually included in the Google index - as Google Search Console lists this in the sitemap section. So thanks again for this extension, I now have a valid working sitemap!

I must say that XML is a very slow loading and memory intensive format. Although the sitemap was created quickly on my test localhost, it did quite a few time outs when attempting to generate the sitemap on the live webserver, even though that is a much more powerful system than my localhost laptop computer. Eventually I was successful but the sitemap took about 5 minutes to construct. Caching the sitemap is therefore a must. I have tried to up the memory allocated to the script to see if that helps when the cache time runs out and a new map is generated. The pre-cached sitemaps also take a long time to load in a browser, even though they are only 4MB each.

Knowing that Google and Bing/Yahoo accept sitemaps in .txt format, I wonder whether someone approaching a sitemap extension from scratch wouldn't now just dump all index and topic URLs in a .txt file which is generated dynamically in the same manner as this extension. I know XML allows further information such as change frequency and priority, but Google has publicly stated it ignores those signals. As stated above, the date part could only harm indexing and without it, Google will still assign a last modified date of its own by looking at the post date/time. A text file should be much quicker to generate on a larger board and far quicker to read. All the sitemap really needs to work properly is a list of URLs, each on a separate line, set out in a .txt file, with no more than 50,000 URLs per file.
phpBB user since 2002
www.AusRotary.com
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Last modified date/time should be optional

by globetrotting » Sat Jan 25, 2020 12:25 am

I totally agree with you and the reasoning for omitting the date column.
The more, as Google often only displays the theme starting date in the search results, which totally neglects that themes can run for decades.
So I followed your example, found your outlined lines for sitemap.xls. and appender.php to be absolutely correct and had my sitemap done without dates within minutes.

Thank you very much for your efforts! 8-)
Das Sein ändert das Bewußtsein
User avatar
globetrotting
Registered User
Posts: 198
Joined: Thu Jan 15, 2004 8:14 pm
Location: globetrotting
Contact: