Bad and good bots/robots

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Suggested Hosts
Post Reply
ivailo95
Registered User
Posts: 481
Joined: Tue Sep 05, 2017 8:00 am
Location: Bulgaria
Name: Ivailo
Contact:

Bad and good bots/robots

Post by ivailo95 » Wed Aug 15, 2018 8:54 am

Hi there
my hosting company found in my boardbad robots and i wond how can i check for it?
and tell me more about this bad and good bots/robots
by the way i searched for information about bad/good robots but only found how to remove it

Code: Select all

Web sites around the clock are crawled by bots (robots), mostly on search engines such as Googlebot, msnbot, YandexBot, bingbot, and others. These robots index the content of the sites in order to provide more accurate and up-to-date search results. They strive to crawl as much as possible, and if possible all pages of your site. You can choose which directories are not crawled by these bots by using a robots.txt file in the root directory of the site.

Before crawling the site, the robot checks the robots.txt file, from where it understands which directories can be indexed and which ones do not. The syntax of the file is quite simple:

User-agent: *
Disallow:
The user-agent describes which bot they are listed for, and Disallow lists the forbidden directories. In the above example, wildcard (*) states that the rules apply to absolutely all bots, and Disallow: / disables access to the root directory, ie. and to all of its subdirectories. If you skip "/" in Disallow, it will mean there is no directory restriction and bots are free to crawl all the directories in your hosting account.

To restrict Googlebot access to / admin directory, for example, the robots.txt file should look like this:

User-agent: Googlebot
Disallow: / admin
You can also disable access to a specific file:

User-agent: Googlebot
Disallow: /DirectoryName/FileName.html
If you are not sure of the exact name of the bot that you want to restrict, you can see it in the Awstats statistics or in the Raw Access Log on the site. Detailed information about the robots.txt file and its usage can be found at the following address:

http://www.robotstxt.org/robotstxt.html

The site also includes a list of a large number of robots and a brief description for everyone.

Bad bots
There are other robots whose indexing does not positively favor a better site positioning on the web, but instead the site is scanned to try to abuse it. This includes security breach, SPAM publishing in contact forms, collecting email addresses that are then sent SPAM and many more. We call these robots bad bots. We may use the .htaccess file if we want to limit their access.

An effective method for blocking bad robots is the use of the User Agent that the robot presents. You can restrict this User-agent with Rewrite rules in .htaccess:

RewriteEngine On
RewriteCond% {HTTP_USER_AGENT} ^ (. *) Surfbot [NC, OR]
RewriteCond% {HTTP_USER_AGENT} ^ (. *) ChinaClaw [NC, OR]
RewriteCond% {HTTP_USER_AGENT} ^ (. *) Zeus [NC]
RewriteRule. * - [F]
In this example, the Surfbot, ChinaClaw, and Zeus robots will receive a Forbidden 403 message attempting to access the contents of the directory containing the .htaccess file. You can add more robots, adding [OR] (or) at the end of each line, except the last User-agent. (Note, however, that adding too many rules to the .htaccess file may slow down the loading of the site in some cases.)

With such blocking, it is advisable to have the 404 Not Found and 403 Forbidden error pages. In the event that these pages are dynamically generated by your system, this may result in additional unnecessary load.

Another way to block User-agent is to use SetEnvIfNoCase again in the .htaccess file. Here is an example:

SetEnvIfNoCase User-Agent "^ (. *) Surfbot" bad_bot
SetEnvIfNoCase User-Agent "^ (. *) ChinaClaw" bad_bot
SetEnvIfNoCase User-Agent "^ (. *) Zeus" bad_bot
<GET POST HEAD limit>
 Order Allow, Deny
 Allow from all
 Deny from env = bad_bot
</ Limit>
The first part defines User-agents, which will be recognized as bad and in the second part block all requests (GET, POST, HEAD) from such robots. 
translated by google translate
i want to know more about this good and bad bots and robots also what they do on my site :roll: :?:

User avatar
Mick
Support Team Member
Support Team Member
Posts: 20237
Joined: Fri Aug 29, 2008 9:49 am
Location: Cardiff

Re: Bad and good bots/robots

Post by Mick » Wed Aug 15, 2018 10:05 am

‘Good bots’ are those like Google bot that search your board for interesting stuff and indexes it in their search engines so you get hits when you search the internet. Generally the good ones are listed in the bots group, which you can maintain. They show in the online list in grey by default.

The bad ones are those that (try to) register on your board and post spam. They are generally automated but we’ve seen a huge increase in human spammers lately.
"The more connected we get the more alone we become" - Kyle Broflovski

There are no ‘threads’ in phpBB, they are topics.

User avatar
HiFiKabin
Community Team Member
Community Team Member
Posts: 3271
Joined: Wed May 14, 2014 9:10 am
Location: Swearing at the PC, UK
Name: James
Contact:

Re: Bad and good bots/robots

Post by HiFiKabin » Wed Aug 15, 2018 10:15 am

There is a 'block bad bots' list and htaccess file on my extension site (see my sig)

ivailo95
Registered User
Posts: 481
Joined: Tue Sep 05, 2017 8:00 am
Location: Bulgaria
Name: Ivailo
Contact:

Re: Bad and good bots/robots

Post by ivailo95 » Wed Aug 15, 2018 3:51 pm

soo what type is this bots? :roll: :?:

Code: Select all

BrowserMatchNoCase AhrefsBot bad_bot
BrowserMatchNoCase Semrush bad_bot
BrowserMatchNoCase Majestic bad_bot
BrowserMatchNoCase Dotbot bad_bot
BrowserMatchNoCase BLEXBot bad_bot

User avatar
Lumpy Burgertushie
Registered User
Posts: 65185
Joined: Mon May 02, 2005 3:11 am
Contact:

Re: Bad and good bots/robots

Post by Lumpy Burgertushie » Wed Aug 15, 2018 4:00 pm

bottom line is that if you have your registration spam counter measures setup correctly there is no problem with the so called "bad bots" accessing your server. they can't register so they are not causing you any problems.

setup a good Q&A for registration and forget about it.

robert
I am available for custom work on a donation basis. Please send me a PM with your needs.

Premium phpBB 3.2 Styles by PlanetStyles.net

OK, so what's the speed of dark?

ivailo95
Registered User
Posts: 481
Joined: Tue Sep 05, 2017 8:00 am
Location: Bulgaria
Name: Ivailo
Contact:

Re: Bad and good bots/robots

Post by ivailo95 » Wed Aug 15, 2018 4:04 pm

Lumpy Burgertushie wrote:
Wed Aug 15, 2018 4:00 pm
bottom line is that if you have your registration spam counter measures setup correctly there is no problem with the so called "bad bots" accessing your server. they can't register so they are not causing you any problems.

setup a good Q&A for registration and forget about it.

robert
i have no problem with spam bots
i set the Q&A captcha up

i just ask about these "bad bots" and "good bots"

User avatar
HiFiKabin
Community Team Member
Community Team Member
Posts: 3271
Joined: Wed May 14, 2014 9:10 am
Location: Swearing at the PC, UK
Name: James
Contact:

Re: Bad and good bots/robots

Post by HiFiKabin » Wed Aug 15, 2018 4:16 pm

Bad bots (those not in the default bot list) tend to ignore any robots.txt file you may have and can over index your forum as I have described. In a worse case scenero they can effectively ddos your board (as once happened to me)

User avatar
Dog Cow
Registered User
Posts: 2491
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: Bad and good bots/robots

Post by Dog Cow » Wed Aug 15, 2018 4:51 pm

Mick wrote:
Wed Aug 15, 2018 10:05 am
‘Good bots’ are those like Google bot that search your board for interesting stuff and indexes it in their search engines so you get hits when you search the internet.
Yeah, I'd say that the good bots are the ones that directly benefit me or the site. And the bad bots are the ones that serve me no benefit. As I'm the one paying for bandwidth, not they, I decide if I'm benefitting from them using my bandwidth! :D
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2801
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Bad and good bots/robots

Post by thecoalman » Thu Aug 16, 2018 12:37 am

ivailo95 wrote:
Wed Aug 15, 2018 3:51 pm
soo what type is this bots? :roll: :?:

Code: Select all

BrowserMatchNoCase AhrefsBot bad_bot
BrowserMatchNoCase Semrush bad_bot
BrowserMatchNoCase Majestic bad_bot
BrowserMatchNoCase Dotbot bad_bot
BrowserMatchNoCase BLEXBot bad_bot
"Bad bot" is subjective term. Bots like this are listing a user agent which is why phpBB can identify them. They are aggregating content on your site for variaety of reasons some of which you may not like. The Semrush bot for example is used by marketers to place ads on your site, if you are using Adwords for example this bot could potentially increase revenue from ads. If you have no ads on your site it's useless to you.

This site lists bots, what they do and grades them: https://www.distilnetworks.com/bot-directory/

Bots that do identify themselves typically will adhere to to the robots.txt file . This is simple .txt file that you put in the root of your domain with directives that will tell the bot what it is allowed to do. If you want to stop them from accessing your site this is the place to start because those that listen to this directive won't ever load any pages to begin with if you told them not to.

http://www.robotstxt.org/

Some that identify themselves may not listen to those directives but this rare since they are identifying themselves. You would have to block them through some other means like .htaccess

You can block any bot using phpBB that identifies but this is on top of the other measures.

The really bad bots like spammers do not identify themselves as bots and are going to have a browser user agent to make them appear like some regular user. Those can only be blocked using CAPTCHA's and other things that trip them up.

ivailo95
Registered User
Posts: 481
Joined: Tue Sep 05, 2017 8:00 am
Location: Bulgaria
Name: Ivailo
Contact:

Re: Bad and good bots/robots

Post by ivailo95 » Sun Aug 19, 2018 10:06 pm

um what type robot is semrush
Image

User avatar
Mick
Support Team Member
Support Team Member
Posts: 20237
Joined: Fri Aug 29, 2008 9:49 am
Location: Cardiff

Re: Bad and good bots/robots

Post by Mick » Mon Aug 20, 2018 8:24 am

"The more connected we get the more alone we become" - Kyle Broflovski

There are no ‘threads’ in phpBB, they are topics.

ivailo95
Registered User
Posts: 481
Joined: Tue Sep 05, 2017 8:00 am
Location: Bulgaria
Name: Ivailo
Contact:

Re: Bad and good bots/robots

Post by ivailo95 » Mon Aug 20, 2018 9:51 am

Mick wrote:
Mon Aug 20, 2018 8:24 am
Semrush.
based on this site this could be "good" robot? :roll: :?:

User avatar
Mick
Support Team Member
Support Team Member
Posts: 20237
Joined: Fri Aug 29, 2008 9:49 am
Location: Cardiff

Re: Bad and good bots/robots

Post by Mick » Mon Aug 20, 2018 10:21 am

Seeing as it’s spam I doubt it.
"The more connected we get the more alone we become" - Kyle Broflovski

There are no ‘threads’ in phpBB, they are topics.

User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 2801
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Bad and good bots/robots

Post by thecoalman » Mon Aug 20, 2018 2:07 pm

ivailo95 wrote:
Mon Aug 20, 2018 9:51 am
based on this site this could be "good" robot? :roll: :?:
Again this is subjective term. Distill Networks list it as a good bot and their criteria for listing it as good bot boils down to if it obeys the robots.txt directives. It's a good bot in that listens to the directions you give it.

Whether it's useful to you or not is something you would need to consider. AFAIK a lot of marketing companies use data from that bot to target Adsense ads. If you are using Adsense or possibly other networks it may be beneficial for your ad revenue.

Post Reply

Return to “phpBB Discussion”

Who is online

Users browsing this forum: No registered users and 26 guests