Solve this SEO Issue Once and For All

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Anti-Spam Guide
Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Wed Jul 11, 2018 3:44 pm

thecoalman wrote:
Wed Jul 11, 2018 3:19 pm
It needs to be:

Code: Select all

Disallow: /ppp/viewforum.php?f=1$
You should have Google Webmaster account, you can always see what is being blocked and test these things.
OK, now updated. BTW. What exactly is that line doing?

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Wed Jul 11, 2018 3:52 pm

Ger wrote:
Wed Jul 11, 2018 3:05 pm
Robots.txt is not intended to prevent leaking of information since no bot needs to respect what's in it.

Also, those lines do nothing about duplicates. IMO, you should just remove those lines from robots.txt. It does more harm than good.
Some of them will have to stay, sorry :D They also prevent search engine leachers, who try to garner info from your site via a search engine to get around IP or UA blocks. If you can prevent the major SEs from accessing those pages you prevent spammers and harvesters from accessing those pages, even if it is a small amount.

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2796
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Solve this SEO Issue Once and For All

Post by thecoalman » Wed Jul 11, 2018 4:04 pm

Frank Rizzo wrote:
Wed Jul 11, 2018 3:44 pm
OK, now updated. BTW. What exactly is that line doing?
It's going to block access specifically to /ppp/viewforum.php?f=1, the $ on the end tells the bot that is the end of the URL. There is no other way I'm aware of to match that specific string with robots.txt.

The second line is to match both viewtopic and viewforum that includes the string .php?f=1& and other parameters are present.

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Wed Jul 11, 2018 4:23 pm

Of course. I don't actually have a need for that entry (I don't even have a forum 1 :D)

User avatar
Ger
Recognised Extension Developer
Posts: 1667
Joined: Wed Jan 02, 2008 7:35 pm
Location: 192.168.1.100
Contact:

Re: Solve this SEO Issue Once and For All

Post by Ger » Wed Jul 11, 2018 4:27 pm

Frank Rizzo wrote:
Wed Jul 11, 2018 3:52 pm

Some of them will have to stay, sorry :D They also prevent search engine leachers, who try to garner info from your site via a search engine to get around IP or UA blocks. If you can prevent the major SEs from accessing those pages you prevent spammers and harvesters from accessing those pages, even if it is a small amount.
That is simply not true.

This is true:
thecoalman wrote:
Wed Jul 11, 2018 3:29 pm
Those lines would prevent a bot from making requests to forums and the topics in them you do not want indexed. This prevents unnecessary requests to your server and also frees the bot resources to index or reindex other content on your site you do want indexed.
But only in updated form. It blocked a whole lot of indexes in it's previous form. Including topics and posts in any forum id starting with 1, so also 11, 100, etc.
My extensions:
Simple CMS, Feed post bot, Avatar Resize, Modbreak, Magic OGP, Live topic update, Modern Quote, Quoted Where (GDPR) and Autoresponder.
Newest: FAQ manager for 3.2

Like my work? Buy me a coffee to keep it coming. :ugeek:
-Available for custom work-

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Wed Jul 11, 2018 5:23 pm

It is actually true, sorry.

As I said earlier it is a technique to prevent search engine leechers. This is where humans or bots use serps to retreive pages without hitting on the target directly.

blocking in robots.txt will not stop rogue bots themselves, so you have to do that with other means. However, blocking the main SEs from accessing those resources does prevent secondary leaks via serps. It is a thing.

User avatar
Ger
Recognised Extension Developer
Posts: 1667
Joined: Wed Jan 02, 2008 7:35 pm
Location: 192.168.1.100
Contact:

Re: Solve this SEO Issue Once and For All

Post by Ger » Wed Jul 11, 2018 7:52 pm

You are of course free to do whatever you like. In its current form I also don't see it doing any harm to your board .
My extensions:
Simple CMS, Feed post bot, Avatar Resize, Modbreak, Magic OGP, Live topic update, Modern Quote, Quoted Where (GDPR) and Autoresponder.
Newest: FAQ manager for 3.2

Like my work? Buy me a coffee to keep it coming. :ugeek:
-Available for custom work-

User avatar
Mick
Support Team Member
Support Team Member
Posts: 19986
Joined: Fri Aug 29, 2008 9:49 am
Location: Cardiff

Re: Solve this SEO Issue Once and For All

Post by Mick » Wed Jul 11, 2018 8:36 pm

Frank Rizzo wrote:
Wed Jul 11, 2018 3:00 pm
sorry topic, was not closed due to abuse
Nobody said it was.
"The more connected we get the more alone we become" - Kyle Broflovski

There are no ‘threads’ in phpBB, they are topics.

User avatar
AmigoJack
Registered User
Posts: 5324
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Solve this SEO Issue Once and For All

Post by AmigoJack » Thu Jul 12, 2018 8:33 am

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
I posted a thread
Topic. Not even phpBB2 used the wrong term.

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
I have run a forum
A board.

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
putting quotes around a phrase
A phrase is only that by quotes. A "phrase without quotes" is a bunch of keywords (implicitly using the AND operator). Are you sure you understand search engines?

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
site:www.flatstats.co.uk/ppp/ shows there to be 2,240 results
Welcome to the internet, it works thru links. Moreover you should use the "verbatim" setting, which will even show more results: https://www.google.fi/search?q=site:www ... /&tbs=li:1

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
one of the 11 sitemaps does have 3,168 warnings
phpBB has no sitemaps, you must be referring to something which is unknown to us.

Frank Rizzo wrote:
Wed Jul 11, 2018 10:44 am
Bots Info
Bing is listed twice - inspect both and check the data against a default installation.

Frank Rizzo wrote:
Wed Jul 11, 2018 11:19 am
Configure My Site -> Ignore URL Parameters
f
t
p
If you don't understand URIs then don't fiddle with them. Try accessing this topic without parameters t and p. But I'm aware that today's internet browsers even tend to hide URI parameters, so that might misteach people on how internet addresses work. Though you have 11 years of experience.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/cache/
Disallow: /ppp/files/
Disallow: /ppp/includes/
Disallow: /ppp/store/
This is pointless, as nobody has access to those paths anyway - each of those should have a .htaccess already.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/config/
Disallow: /ppp/develop/
Disallow: /ppp/index.html
These are unknown to phpBB.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/docs/
That was never meant to be published online - just delete the folder.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/install/
This is pointless: if that path would exist then phpBB would resist to have an accessible board. Remove both: this entry and the potential folder.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/common.php
Disallow: /ppp/config.php
This is pointless: you're just telling everyone that these files exist, while in reality nobody would have ever known, since they aren't linked anywhere.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/search.php
Disallow: /ppp/style.php
A highly questionable approach: search queries like "show all posts of a user" won't be indexed anymore, and all styles in your cached search engine results might be missing as well.

Frank Rizzo wrote:
Wed Jul 11, 2018 2:53 pm

Code: Select all

Disallow: /ppp/*.php?f=1&*
Wildcards in the "disallow" directive are interpreted differently, better avoid them.

thecoalman wrote:
Wed Jul 11, 2018 3:19 pm

Code: Select all

Disallow: /ppp/viewforum.php?f=1$
Regular expressions are not supported. And if they would, then the ? must be escaped. No, everything you put up here is a literal, so this given line will never match.
The worst thing about censorship is ███████████

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Thu Jul 12, 2018 8:54 am

What was the point of 90% of that post Amigo? That was mostly not helpful.

- - - - - - - - - - - - - - - - - -

Ref: Bing parameters. Still no full crawl yet since the values were removed. This is all that has been crawled so far today:

Code: Select all

40.77.167.149 - - [12/Jul/2018:03:49:23 +0100] "GET /ppp/feed HTTP/1.1" 200 3437 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0 www.example.co.uk "-" "-"
157.55.39.43 - - [12/Jul/2018:05:26:08 +0100] "GET /ppp/viewforum.php?st=0&sk=t&sd=d HTTP/1.1" 404 8941 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0 www.example.co.uk "-" "-"
157.55.39.43 - - [12/Jul/2018:06:08:18 +0100] "GET /ppp/feed/forums HTTP/1.1" 200 1808 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0 www.example.co.uk "-" "-"
40.77.167.211 - - [12/Jul/2018:08:41:24 +0100] "GET /ppp//viewforum.php?f=16 HTTP/1.1" 200 12019 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0 www.example.co.uk "-" "-"
That double slash at 08:41 looks a concern.

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Thu Jul 12, 2018 9:05 am

Looking at the Google issue (where it does crawl the site but lilttle response in serps). There are some search terms appearing but they are way down the long tail.

This could indeed be due to the lack of meta data. Each page has a title tag but no specific tags for title or description.

I also have a lot of around forum content, which is duplicated on every page and that could be causing a problem. I know it is suggested not to feed different content to users and SEs but I wonder if using a different style for bots would help. I use a customised bootlike profile. I guess I could feed bots prosilver, which is pretty much standard and could have the duplicate content removed.

User avatar
AmigoJack
Registered User
Posts: 5324
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Solve this SEO Issue Once and For All

Post by AmigoJack » Thu Jul 12, 2018 10:49 am

Frank Rizzo wrote:
Thu Jul 12, 2018 8:54 am
What was the point of 90% of that post Amigo? That was mostly not helpful.
What was the point of replying twice in a row? And if 90% of the issues I replied to are not helpful then let me put it in a more direct way: you're struggling with basics on almost anything, and you prove to ignore resolving issues and ambiguousities. How about putting more effort into understanding it all? This is a discussion, and these are my views - this is not a support topic.

Frank Rizzo wrote:
Thu Jul 12, 2018 8:54 am
Still no full crawl yet since the values were removed
Search engines rather slowly adapt to changes. Have you even looked at any Bing documentations which could have mentioned this?

Frank Rizzo wrote:
Thu Jul 12, 2018 8:54 am
That double slash at 08:41 looks a concern.
Could also be Bing itself acting parastupid - just look at requesting viewforum.php without any f parameter which (of course) returned status 404.

Frank Rizzo wrote:
Thu Jul 12, 2018 9:05 am
I wonder if using a different style for bots would help
Most likely you'll be penalized to give bots a different view. I have news for you: Google (and other search engines) do not always identify as themselves - half of your guests are most likely still search engines.
The worst thing about censorship is ███████████

Frank Rizzo
Registered User
Posts: 135
Joined: Sun Jan 05, 2003 11:10 pm
Contact:

Re: Solve this SEO Issue Once and For All

Post by Frank Rizzo » Thu Jul 12, 2018 11:08 am

Hmm. If serving bots a different 'view' to regular users is such an issue why do phpbb offer it as a configuration option?

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2796
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Solve this SEO Issue Once and For All

Post by thecoalman » Thu Jul 12, 2018 11:42 am

AmigoJack wrote:
Thu Jul 12, 2018 8:33 am
thecoalman wrote:
Wed Jul 11, 2018 3:19 pm

Code: Select all

Disallow: /ppp/viewforum.php?f=1$
Regular expressions are not supported. And if they would, then the ? must be escaped. No, everything you put up here is a literal, so this given line will never match.
It's not within the specification but neither is the wildcard for use in URL's. It's not ? but a $. This is supported by Google, Bing and other major bots. The formatting of the directive is correct.

User avatar
AmigoJack
Registered User
Posts: 5324
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Solve this SEO Issue Once and For All

Post by AmigoJack » Thu Jul 12, 2018 12:15 pm

Frank Rizzo wrote:
Thu Jul 12, 2018 11:08 am
why do phpbb offer it
Times and search engines change, and phpBB is not that good in recognizing all bots either. I guess the main intent was to see bots at all when crawling a board, instead of ignoring them.

thecoalman wrote:
Thu Jul 12, 2018 11:42 am
This is supported by Google, Bing and other major bots.
Can you link to any source? It seems to me inconsistent that choosing $ off the regular expressions world to being interpreted, but not other meta characters. I only find: Alright, learned something new.
The worst thing about censorship is ███████████

Post Reply

Return to “phpBB Discussion”

Who is online

Users browsing this forum: No registered users and 26 guests