[RC] HTTP Guest Cache

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment!
Scam Warning
SuperFedya
Registered User
Posts: 248
Joined: Sun Jul 14, 2002 9:14 pm
Contact:

Re: [RC] HTTP Guest Cache

Post by SuperFedya » Fri Jun 07, 2013 2:21 am

"cache" in the URL will hurt SEO.
There will be 2 urls for each page, that bad for SEO.

Any possible solutions?

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Fri Jun 07, 2013 10:25 am

SuperFedya wrote:"cache" in the URL will hurt SEO.
There will be 2 urls for each page, that bad for SEO.
It shouldn't be; for the purposes of this mod all bots (search spiders) are treated as guests, which mean that nearly all pages they view will be redirected to add cache=1 (if it isn't already there), which means that for the purposes of indexing it should be consistent. I believe also that search engines should internally combine URLs that are redirect.

In any event, I haven't noticed any impact on my own site which still holds the same search rankings that it did before I added the mod. The only real issue is that for things like Google analytics you will need to tell it to ignore the cache parameter otherwise you get two separate results for otherwise identical pages.


The problem is that if I were to use the same URL for the two pages and only distinguish cacheability via response headers, then a user's browser would have no way of knowing whether to pull the page from cache or not. For example, if you're viewing a topic as a guest, then sign-in, you might still end up viewing the guest version of that topic, and that problem would propagate to intermediate caches as well so refreshing the page may not actually help. The only alternative I can think of is to instead force URLs to always include the SID for logged-in users, to distinguish their page-views from a guest's; more search engines may ignore SIDs by default, but then it leaves guests unable to override cacheability which the cache parameter allows.

I dunno; I don't think there's any ideal solution, as forcing inclusion of SIDs has its own drawbacks, namely that logged in users may accidentally include their SID when posting links to something on the forum, which is a potential security issue.

So yeah, personally I'd happiest with the cache parameter as the solution, and the admin panel allows you to change the name of the parameter if you think something else will be more compatible; you could change it to something that more browsers will ignore by default, though I'm not sure what that might be.

SuperFedya
Registered User
Posts: 248
Joined: Sun Jul 14, 2002 9:14 pm
Contact:

Re: [RC] HTTP Guest Cache

Post by SuperFedya » Mon Jun 10, 2013 1:48 pm

2 different URL for the same topic will hurt SEO anyway.

Any possibility to enable this mod only for guest and not for any bots?

Ozo
Registered User
Posts: 330
Joined: Mon Dec 13, 2010 7:57 pm

Re: [RC] HTTP Guest Cache

Post by Ozo » Mon Jun 10, 2013 5:39 pm

SuperFedya wrote:2 different URL for the same topic will hurt SEO anyway.

Any possibility to enable this mod only for guest and not for any bots?
Automatically sanitises URLs and adds a parameter to better support caching of only guest pages (logged in users are unaffected).
The search engine bots listed in your ACP are not guests when they crawl your site. They are part of the memberlist of your forum.

Then, if you are worried about links of your site with the 'cache' parameter being used around, use robots.txt to let bots know not to craw those kinds of URL's. Also the best search engines out there already have webmaster tools to take care of this.

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Mon Jun 10, 2013 7:53 pm

Ozo wrote:The search engine bots listed in your ACP are not guests when they crawl your site.
For the purposes of this mod they are treated as if they were guests; they'll sign in long enough to appear as active on the site, but will then pull pages from cache if they're held somewhere between the spider and your site.
SuperFedya wrote:2 different URL for the same topic will hurt SEO anyway.
As I say I've seen no evidence of this on my own site, which is a pretty busy one visited frequently by search spiders. Once the spider reaches the site, all URLs will have cache=1 on the end, so while actually indexing your site it makes no difference to the spider. Also, Google's spider (I haven't checked others) correctly ignored the parameter on my site in the same way that it does for sid and other session parameters, though if you want to be sure you can tell it explicitly to ignore it.

Otherwise the only links of concern are entry-points to your site (links from other sites), however these will immediately redirect to include the cache=1 parameter, which spiders will detect and treat the two URLs the same. In fact, I'm pretty sure it's that redirection that causes Google to ignore it.

It wouldn't be too hard to add an option to enable/disable the mod for bots, but it'll have to wait for either another RC release or the release version of the mod, depending when I have time. However, I should point out that search bots are some of the worst guest users; on my site, which gets around 90,000 pageviews a day, Google Bot accounts for a further 30,000 page views (on top of the 90k); disabling caching for Google and other bots would have a pretty big impact on my site, so I'm not sure I would advise using such a feature even after I've added it.

User avatar
Sarr
Registered User
Posts: 23
Joined: Sun Aug 02, 2009 9:10 pm
Location: Poland
Contact:

Re: [RC] HTTP Guest Cache

Post by Sarr » Wed Jul 03, 2013 8:40 am

Hello. Big problem here. Google & Bing aren't indexing my forum (very active one: http://forum.neverwinter.com.pl) at all.
Portal (http://www.neverwinter.com.pl) is indexed _very_ high. Forum - from 501 links in sitemap, indexed 1 link.

I try hard to know what's the problem. And now I checked my site with Bing Webmaster Tools.
It's been a few days and not indexing! Even sitemap is stuck at "Pending":

Code: Select all

http://forum.neverwinter.com.pl/sitemap.php 	2013-06-30 	275 	2013-06-30 		Pending
Now, when I try to fetch forums as Bing Bot (option in Diagnostics & Tools), it shows me this:

Code: Select all

URL: http://forum.neverwinter.com.pl/
Status: Redirection limit reached.

HTTP/1.1 302 Moved Temporarily
Cache-Control: private, no-cache, max-age=0
Connection: Keep-Alive
Date: Wed, 03 Jul 2013 08:36:06 GMT
Keep-Alive: timeout=10, max=100000
Pragma: public
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Expires: 0
ETag: 1372855973
Location: http://forum.neverwinter.com.pl/index.php?cache=1
Server: Apache
Set-Cookie: nwo_forum_u=; expires=Tue, 03-Jul-2012 08:36:07 GMT; path=/; domain=.neverwinter.com.pl; HttpOnly
Set-Cookie: nwo_forum_k=; expires=Tue, 03-Jul-2012 08:36:07 GMT; path=/; domain=.neverwinter.com.pl; HttpOnly
Set-Cookie: nwo_forum_sid=; expires=Tue, 03-Jul-2012 08:36:07 GMT; path=/; domain=.neverwinter.com.pl; HttpOnly
What's going on?
It's most drastic with google - not indexing at all, but shows no errors.
Polish Neverwinter Online portal: http://www.neverwinter.com.pl
Direct link to the forum: forum.neverwinter.com.pl
Polish D&D Online Joomla portal with integrated PhPbb3: http://www.ddopl.com
Direct link to the forum: forum.ddopl.com

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Wed Jul 03, 2013 9:25 am

Sarr wrote:What's going on?
It's most drastic with google - not indexing at all, but shows no errors.
Hmm, I'm not sure; do you have any other mods installed? I notice that your response headers include an ETag, but phpBB doesn't add that header to pages, and neither does my mod. It seems like bot users are somehow getting stuck in a redirect loop and giving up, but I just tried the URL http://forum.neverwinter.com.pl/index.php with no issues (just a single, redirect to add ?cache=1) so I don't see what issues search bots should be having.

User avatar
Sarr
Registered User
Posts: 23
Joined: Sun Aug 02, 2009 9:10 pm
Location: Poland
Contact:

Re: [RC] HTTP Guest Cache

Post by Sarr » Wed Jul 03, 2013 9:45 am

Haravikk wrote:
Sarr wrote:What's going on?
It's most drastic with google - not indexing at all, but shows no errors.
Hmm, I'm not sure; do you have any other mods installed? I notice that your response headers include an ETag, but phpBB doesn't add that header to pages, and neither does my mod. It seems like bot users are somehow getting stuck in a redirect loop and giving up, but I just tried the URL http://forum.neverwinter.com.pl/index.php with no issues (just a single, redirect to add ?cache=1) so I don't see what issues search bots should be having.
Strange. I have also Guest & Bot Cache Mod, which makes file cache, but I don't think it would be a problem.
Maybe it's some issue with having servers on Cloudflare CDN too?...

Another strange thing.
Looking to headers via this:
http://urivalet.com/?http://forum.never ... pl/#Report
Or via:
http://web-sniffer.net/?url=http%3A%2F% ... s&http=1.1
I see robot get "noindex, follow" o.O
<link rel="canonical" href="http://forum.neverwinter.com.pl/?cache=1" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<title>Polskie Forum Neverwinter!</title>
<meta http-equiv="content-style-type" content="text/css" />
<meta http-equiv="content-language" content="pl-PL" />
<meta http-equiv="imagetoolbar" content="no" />
<meta name="content-language" content="pl" />
<meta name="title" content="Polskie Forum Neverwinter!" />
<meta name="description" content="Polskie Forum Neverwinter! : forum.neverwinter.com.pl" />
<meta name="keywords" content="neverwinter, forum, polskie" />
<meta name="category" content="general" />
<meta name="robots" content="noindex,follow" />
<meta name="distribution" content="global" />
<meta name="resource-type" content="document" />
<meta name="copyright" content="forum.neverwinter.com.pl" />

While if you log in and check (or even when normal user looks in code of page) it's "index, follow".
What could happen?

Ok, I know what happened. This mod adds "meta name="robots" content="noindex,follow" to my pages!
Why? When I turn it off, ?cache=1 prefix is gone, and it's again "index,follow".
When i turn HTTP Guest Cache on, it's all "noindex, follow" on _whole site_. :?:
Polish Neverwinter Online portal: http://www.neverwinter.com.pl
Direct link to the forum: forum.neverwinter.com.pl
Polish D&D Online Joomla portal with integrated PhPbb3: http://www.ddopl.com
Direct link to the forum: forum.ddopl.com

User avatar
w5hro
Registered User
Posts: 100
Joined: Sun Jan 16, 2011 4:40 pm

Re: [RC] HTTP Guest Cache

Post by w5hro » Fri Jul 12, 2013 10:13 pm

First of all I want to say thank you for working on this MOD along with the other one as this will probably be my very last post on this phpBB board.

Anything like this that will help reduce the CPU usage especially on shared servers is greatly needed. I installed your other MOD first then this one, but I had to remove this one due to an issue with my mobile style. I think Mobile styles may have been overlooked here.

An example is with the Artodia styles. When a user loads the board URL using a smart phone it automatically calls up the mobile style.

The first part of the code below is added within includes/session.php

Code: Select all

// Mod: phpBB Mobile start
include_once($phpbb_root_path . 'includes/mobile.' . $phpEx);
phpbb_mobile::setup('art_mobile'); // Change first parameter to correct directory name of mobile style
// Mod: phpBB Mobile end

// Call phpbb_user_session_handler() in case external application wants to "bend" some variables or replace classes...
// After calling it we continue script execution...
phpbb_user_session_handler();
What happened is when I installed this HTTP Guest Cache MOD the mobile style would come up on smart phones, but in a continuous loop and would do something similar to the below. Each time it would add to the URL :mrgreen:

Loop 1: http://www.mydomain.com/mobile.php?cache1
Loop 2: http://www.mydomain.com/mobile.php?cach ... php?cache1
Loop 3: http://www.mydomain.com/mobile.php?cach ... php?cache1
Loop 4 ………… etc.

Not only that, but when I would log out of my board via my desktop PC while the Prosilver style was up it would switch to the mobile style and also start up in the continuous loop.

Anyway, I just wanted to give you this feedback, because many of us have mobile styles installed so they should probably be included in both of your MODs.

Hope this information helps, and thanks again…
.

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Sat Jul 13, 2013 12:21 pm

Hmm, there must some kind of oddity with how the mobile style is generating its URL; my mod uses the $_EXTRA_URL global variable to trigger append_sid() to add the cache parameter automatically. I picked this method because it should be compatible with everything… that is unless it modifies the URL directly which is probably what's happening.

I'm not sure how I would implement a work around, do you have a link to the mod that implements the mobile style?

For those interested in an update; I still haven't had a lot of time to work on one, but I do have a largish interesting feature I'm hoping to add which is support for the CloudFlare API, which may allow me to eliminate the delays that caching causes, i.e - when a page is updated it can be removed from CloudFlare's cache or at least made to expire more quickly, at least that's the idea :)

User avatar
w5hro
Registered User
Posts: 100
Joined: Sun Jan 16, 2011 4:40 pm

Re: [RC] HTTP Guest Cache

Post by w5hro » Sat Jul 13, 2013 2:32 pm

Haravikk wrote:I'm not sure how I would implement a work around, do you have a link to the mod that implements the mobile style?
The Artodia style MODs are here...
http://www.artodia.com/phpbb-styles/mobile/

There are also URL based Mobile Detection MODs here on phpBB...
https://www.phpbb.com/customise/db/mod/ ... l_based%29

Both MODs add code to session.php and there will be an issue when it is there. If you cant come up with a solution then maybe at a minimum figure out a way to disable or bypass the HTTP Guest Cache feature when the mobile switch is detected or something similar like that. These mobile detect switches are getting really popular now.

Thanks again…

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Sat Jul 13, 2013 3:32 pm

w5hro wrote:Both MODs add code to session.php and there will be an issue when it is there.
Are these both definitely incompatible? I just had a look at the second link but it seems to just check whether you're accessing the site from mobile.yourdomain.com and changes the style, which shouldn't cause any issues with my mod that I can see, as it means all mobile pages are separated under their own sub-domain.

But then I also didn't see anything that should affect the Artodia mod(s) either; the only mobile.php I see inside it should be under includes/mobile.php, but your loop example points to mydomain.com/mobile.php, is there a reason for this (i.e - do you actually have a mobile.php script in your phpBB root, and if so, what does it do?).

User avatar
w5hro
Registered User
Posts: 100
Joined: Sun Jan 16, 2011 4:40 pm

Re: [RC] HTTP Guest Cache

Post by w5hro » Sat Jul 13, 2013 3:50 pm

Haravikk wrote:
w5hro wrote:Both MODs add code to session.php and there will be an issue when it is there.
But then I also didn't see anything that should affect the Artodia mod(s) either; the only mobile.php I see inside it should be under includes/mobile.php, but your loop example points to mydomain.com/mobile.php, is there a reason for this (i.e - do you actually have a mobile.php script in your phpBB root, and if so, what does it do?).
No, that's why I said it looked similar to those urls when it started looping. I should have copied and pasted from the address line in my browser when it was happening, but I didn't think of it at the time. I remember though it kept adding mobile.php to the URL with each loop. I've already uninstalled your (this) MOD except for the other one. When everything is working properly users never even see mobile.php in their address line. Its setup where they still see index.php in their URL when the mobile style comes up. That's why it was so strange.

The mobile.php file is within the includes directory.

Haravikk
Registered User
Posts: 261
Joined: Sat Nov 02, 2002 4:42 pm

Re: [RC] HTTP Guest Cache

Post by Haravikk » Sat Jul 13, 2013 4:41 pm

Okay, well thanks for letting me know anyway, I'll have to find some time to get both mods installed to see if I can reproduce this and figure out what's happening; it's definitely a weird one as the mobile style mod's swapping of template shouldn't strictly affect my mod.

You're right that I may have to look at compatibility though; bug aside the Artodia mod's URLs don't look like they would differ between mobile and desktop visitors, which means there would be no way for caches to know the difference, meaning a mobile user accessing your board could cause a mobile version of the page to be generated and cached then served up to desktop users… I may be able to add a mobile compatibility Javascript snippet or something to get around that, to keep it from being mod-specific (and so you can disable it if you don't need it).

I'll add a warning to the first post until I get a chance to fix this.

User avatar
w5hro
Registered User
Posts: 100
Joined: Sun Jan 16, 2011 4:40 pm

Re: [RC] HTTP Guest Cache

Post by w5hro » Sat Jul 13, 2013 5:36 pm

I wish I could have left the MOD installed because I did notice a slight decrease in server CPU usage, but I did make the cache time about 6 hours :mrgreen: Then about three days later I received a PM from a user and they said my board had I problem because it wouldn't load on their iPhone. It kept looping over and over again. I don't currently have that many users browsing my board from smart phones because the average age of my user is over 50. That won’t last long though.

Anyway, when I started looking then logged out I noticed it would switch to the mobile version from a desktop PC. I went ahead and removed the MOD since its still in RC mode, but it was defiantly helping so you are headed in the right direction.

Thanks again…

Locked

Return to “[3.0.x] MODs in Development”