Indexed, though blocked by robots.txt

Get help with installation and running phpBB 3.2.x here. Please do not post bug reports, feature requests, or extension related questions here.
Post Reply
User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 6:21 pm

I am confused. In the root of my domain I have a robots.txt file which has this:

Code: Select all

Disallow: /forum/
Yet Google Search has triggered this:

http://www.publictalksoftware.co.uk/forum/index.php

with the warning:
Indexed, though blocked by robots.txt
How can I stop the warning? I don't want the forum indexed.

User avatar
stevemaury
Support Team Member
Support Team Member
Posts: 49554
Joined: Thu Nov 02, 2006 12:21 am
Location: The U.P.
Name: Steve
Contact:

Re: Indexed, though blocked by robots.txt

Post by stevemaury » Wed May 16, 2018 7:12 pm

You should ask Google. Not all BOTs respect robots.txt.
For REALLY good and VERY inexpensive hosting CLICK HERE

I can stop all your spam. PM or email me.

All unsolicited PMs will be ignored.

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 8:02 pm

The entry is here:

https://search.google.com/search-consol ... z5DHUoVGzA

Their documentation says:
Indexed, though blocked by robots.txt: The page was indexed, despite being blocked by robots.txt (Google always respects robots.txt, but this doesn't help if someone else links to it). This is marked as a warning because we're not sure if you intended to block the page from search results. If you do want to block this page, robots.txt is not the correct mechanism to avoid being indexed. To avoid being indexed you should either use 'noindex' or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal. If you do not want to block this page, update your robots.txt file to unblock your page.
It said:
To avoid being indexed you should either use 'noindex' or prohibit anonymous access to the page using auth.
According to here: https://support.google.com/webmasters/a ... 3710?hl=en

It states:
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the <head> section of your page:

Code: Select all

<meta name="robots" content="noindex">
So it seems to me that I need a way to specify this in my index.php because I do link to the support forum on my site but I do not want it being index.

User avatar
stevemaury
Support Team Member
Support Team Member
Posts: 49554
Joined: Thu Nov 02, 2006 12:21 am
Location: The U.P.
Name: Steve
Contact:

Re: Indexed, though blocked by robots.txt

Post by stevemaury » Wed May 16, 2018 8:06 pm

Give the Bots Group a No access role on that forum.
For REALLY good and VERY inexpensive hosting CLICK HERE

I can stop all your spam. PM or email me.

All unsolicited PMs will be ignored.

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 8:12 pm

I see where "Bots" is listed in the "Users and Groups" pane. But then what to I exactly do? Sorry.

sceptre
Registered User
Posts: 109
Joined: Mon Feb 22, 2010 5:01 pm
Location: Alba

Re: Indexed, though blocked by robots.txt

Post by sceptre » Wed May 16, 2018 8:35 pm

Go to forum permissions , select the forum/s , submit
select bots , edit permissions ,no access ...
you could fine tune the permissions by going to advanced permissions/action

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 8:40 pm

Thanks.

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2797
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Indexed, though blocked by robots.txt

Post by thecoalman » Wed May 16, 2018 8:59 pm

ajtruckle wrote:
Wed May 16, 2018 6:21 pm
How can I stop the warning? I don't want the forum indexed.
It's not been indexed in the normal sense, google is aware of the URL but has not crawled it. The chances of it appearing in search result are slim to none. The only information it has about it is the URL itself,perhaps link text used to link to it etc. It's going to have very little information to even give it a title.

Removing the robots.txt rule and then using <meta name="robots" content="noindex"> will work but that will increase server load becsue google will then load the page. AFAIK there is no way to deny permission to index.php out of the box on phpBB thus using permissions on other parts of the forum would expose index.php to indexing. Banning google IP's and other bot IP's would be one work around to this.

Honestly the best thing to do is just leave the robots.txt rule in place and ignore the error.

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 9:05 pm

Oh, well, I have just applied all the bot tweaks as per the previous comments. I have not touched the robots file though. So is this still OK?

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2797
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Indexed, though blocked by robots.txt

Post by thecoalman » Wed May 16, 2018 9:06 pm

sceptre wrote:
Wed May 16, 2018 8:35 pm
Go to forum permissions , select the forum/s , submit
select bots , edit permissions ,no access ...
you could fine tune the permissions by going to advanced permissions/action
IMO the best way to do this is add/remove groups that need access.

ACP >> Permissions Tab >> Forums Permissions >> Select the forum(s) to set permissions.

Add or remove groups under the manage groups heading, if a group is not listed under manage groups and appears in the add groups box the permissions default to none.

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2797
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Indexed, though blocked by robots.txt

Post by thecoalman » Wed May 16, 2018 9:09 pm

ajtruckle wrote:
Wed May 16, 2018 9:05 pm
Oh, well, I have just applied all the bot tweaks as per the previous comments. I have not touched the robots file though. So is this still OK?
Nothing wrong with that but it's not going to accomplish anything. Your robots.txt rule is preventing Google from accessing that page, it's only when you remove the rule it can load the page and it will it get "permissions denied".

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 16, 2018 9:16 pm

Oh. I get it. Well I will just leave it as it is now. Else I will just reset them back to whatever the default values would have been. Whatever that is.

User avatar
ajtruckle
Registered User
Posts: 93
Joined: Tue Apr 19, 2005 10:37 am

Re: Indexed, though blocked by robots.txt

Post by ajtruckle » Wed May 23, 2018 11:03 am

I now get more warnings with the login plugin in Wordpress about this.

I think I will just remove the rule from the robots file.

Then it can crawl the page. And I guess it will crawl nothing else in the forum anyway.

Post Reply

Return to “[3.2.x] Support Forum”

Who is online

Users browsing this forum: AmigoJack, david63, e314, EA117, kinerity, Maxburn, MOSHE1111, thecoalman and 44 guests