Yesterday evening, I started having a problem with our server. Which is, in normal circumstances (even to 10K pageviews per hour) working like a charm.
There was extreme load on the server, about 30 on average. I checked the server's CPU, that was constantly running up to the 100%
Checking my forum stats, I noticed that yesterday we had beaten the most users ever online record: 5023 on 21 Aug 2019 15:56
I continued to look into the stats and found one particular crawler called BLEXBOT was causing the high load, opening more than 800 sessions.
Blocking the crawler from robots.txt didn't suffice as on another site I discovered that the crawler doesn't listen to the order. I contacted the company (webmeup) via mail and Facebook without a reply.
I decided to block a series of IP-addresses from the forums, adding this to the IP block list: 159.138.*.*
Server load and CPU has significantly dropped after the measure.
That won't actually stop it coming to the server. Ask you hosts about blocking it at the server level.
Well, it currently did work.
It might have stopped it crawling your site but it has not stopped it accessing your site - it has to access the site to read the robots.txt file. Also bear in mind that some bots do not adhere to robots.txt, or change their behavior
David Remember: You only know what you know and - you don't know what you don't know! My CDB Contributions | How to install an extension I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic. No support requests via PM or email as they will be ignored
Block it in your htaccess file. Rogue bots like this don't obey Robots.txt.
Even in .htaccess the bots are still accessing your site. They need to be blocked before then - either by using a firewall or the likes of Cloudflare
David Remember: You only know what you know and - you don't know what you don't know! My CDB Contributions | How to install an extension I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic. No support requests via PM or email as they will be ignored
That won't actually stop it coming to the server. Ask you hosts about blocking it at the server level.
Well, it currently did work.
It might have stopped it crawling your site but it has not stopped it accessing your site - it has to access the site to read the robots.txt file. Also bear in mind that some bots do not adhere to robots.txt, or change their behavior
Behavior indeed changed, server's CPU is constantly full, server load higher than usual. Must be from other IP's ...
Check your server access logs and block any suspicious behaviour in your firewall - works for me
David Remember: You only know what you know and - you don't know what you don't know! My CDB Contributions | How to install an extension I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic. No support requests via PM or email as they will be ignored