Question on using htaccess vs claudebot and Safari/537.36

Get help with installation and running phpBB 3.3.x here. Please do not post bug reports, feature requests, or extension related questions here.
Ares101
Registered User
Posts: 5
Joined: Thu Oct 03, 2019 5:04 am

Question on using htaccess vs claudebot and Safari/537.36

Post by Ares101 »

Hey all.

So, like most folks here, I have a phpBB message board that I run. Unlike a lot of folks here, I have no idea regarding coding and the like. I have my website hosted via doteasy and try to do most site management via the Administrator Control Panel and only rarely use cpanel save to update the site.

However, recently I've had issues where the sites resources are being used up and after some inquiries I was told it was likely a bot accessing the site a ridiculous amount of times in a short period. So I went to cpanel, went to Metrics, went to Visistors, and sure enough, the most common User Agents were:


Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +[email protected])


and


Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.2054.110 Safari/537.36


claudebot was doing at least 2/3rds of the visiting. Now, as I mentioned earlier, I don't know how to code. But from what research I've done, htaccess can be used to just deny the above access to the site to begin with, and prevent them from causing the trouble they are.

I've located htaccess in the cpanel under public_html, and I've seen there's an edit option, where presumably I'd insert the script necessary to keep the offenders out.

What I need is a dirt simple script to do the above, and a step-by-step guide on how to edit htaccess so I don't screw anything up. I know this is basically "baby's first script editing", but I'm out of my depth and appreciate any help and patience you're all willing to share. And if I'm trying to fix things in the wrong way, please point me in the right direction.

Thank you all very much for your time.
User avatar
Mick
Support Team Member
Support Team Member
Posts: 26636
Joined: Fri Aug 29, 2008 9:49 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Mick »

There’s a whole topic on this very subject here viewtopic.php?t=2652265

Maybe explaining this to your host as well may help.
  • "The more connected we get the more alone we become" - Kyle Broflovski©
  • "The good news is hell is just the product of a morbid human imagination.
    The bad news is, whatever humans can imagine, they can usually create.
    " - Harmony Cobel
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5917
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by thecoalman »

For bots that identify themselves they usually adhere to robots.txt , you can block them there.

https://developers.google.com/search/do ... f%20Google.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
Ares101
Registered User
Posts: 5
Joined: Thu Oct 03, 2019 5:04 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Ares101 »

thecoalman wrote: Sat Apr 20, 2024 10:51 am For bots that identify themselves they usually adhere to robots.txt , you can block them there.

https://developers.google.com/search/do ... f%20Google.
I've just tried using robots.txt as you mentioned. Using this guide, I created a robots.txt file in public_html, then entered in the following:
User-agent: *
Disallow: /
That seems to have stopped the resource usage, but looking at the Visitors info it looks like they're still trying to search the site.

Is anyone familiar with the code mentioned here? If so, and if it does what it says it does, do I simply copy and paste the listed code to the bottom of the .htaccess file? It also mentions "purge your boards cache", which I'm not sure how to do. Again, this is all pretty new to me.
User avatar
Mick
Support Team Member
Support Team Member
Posts: 26636
Joined: Fri Aug 29, 2008 9:49 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Mick »

There’s a button to purge the cache in the ACP main page.

Support for the bad bots code is at the site you linked to.
  • "The more connected we get the more alone we become" - Kyle Broflovski©
  • "The good news is hell is just the product of a morbid human imagination.
    The bad news is, whatever humans can imagine, they can usually create.
    " - Harmony Cobel
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5917
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by thecoalman »

Ares101 wrote: Sat Apr 20, 2024 4:13 pm
User-agent: *
Disallow: /
That seems to have stopped the resource usage, but looking at the Visitors info it looks like they're still trying to search the site.
This blocks everything that respects robots.txt, Google, Bing etc. If that is what you want to do so be it but your site will drop out of search results. Otherwise the only time you would use that is if you were going to whitelist good bots like Google and use the wildcard to block everyone else, the wildcard has to go at the end.

Code: Select all

User-agent: Googlebot
Allow: /

User-agent: bingbot
Allow: /

User-agent: *
Disallow: /
Bots should use the first rule that applies so those allowed should ignore the last rule. That has the advantage of blocking all unamed bots but that can also be a disadvantage because those unamed bots can be advantageous. The other approach is to blacklist and have a disallow rule for each bot you don't want.

To reiterate robots.txt only works with bots that respect it. Generally speaking if they are identifying themselves they will respect it.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
User avatar
Brf
Support Team Member
Support Team Member
Posts: 53447
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Brf »

That first line, Claudebot, looks like a bot, but the second is simply a user running old Windows 8.1.
Mozilla/5.0 (Windows NT 6.3; Win64; x64)
Ares101
Registered User
Posts: 5
Joined: Thu Oct 03, 2019 5:04 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Ares101 »

First and foremost, I'd just like to say thank you to everyone for their input and helping me navigate this. I really appreciate your advice and your patience dealing with what are probably very basic questions.
Brf wrote: Mon Apr 22, 2024 12:39 pm That first line, Claudebot, looks like a bot, but the second is simply a user running old Windows 8.1.
Mozilla/5.0 (Windows NT 6.3; Win64; x64)
The advice I've gotten above seems to have dealt with claudebot, but I'm still getting what looks like a 1000 visits that are running up the CPU and Entry Process percentages.

The User Agents are mostly similar, with variations like:
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3805.60 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3452.122 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3076.150 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.2638.88 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3424.54 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.2140.119 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3450.54 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.2787.104 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3384.58 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3036.124 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.2496.129 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.2633.39 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.2221.34 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3191.171 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.2822.93 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.2749.21 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.2440.171 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3917.174 Safari/537.36
But also some variations like:
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.143 YaBrowser/22.5.0.1879 (beta) Yowser/2.5 Safari/537.36

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/100.0.4896.127 Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.94 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
There's plenty more, but I think you guys get the picture. I guess my question is if there's any way I can use the above info to stop the abuse my sites receiving, or if there's somewhere else in cPanel I should be looking to get the info to stop this.
User avatar
Brf
Support Team Member
Support Team Member
Posts: 53447
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Brf »

The ones that say bingbot and googlebot are indexing your board. The others are users.
Ares101
Registered User
Posts: 5
Joined: Thu Oct 03, 2019 5:04 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Ares101 »

Brf wrote: Tue Apr 23, 2024 12:14 pm The ones that say bingbot and googlebot are indexing your board. The others are users.
Apologies for any obvious questions, but in this case user is referring to one or more board members with a profile on my message board that is doing all these searches and driving up the resource use, correct? Is there a way to access the user / member data to figure out who is doing so?

*EDIT*

I'm viewing "Who is Online" I do have between 12 and 15 Guests at any one time, which I'm guessing are the ones causing the trouble. Is there any reliable way to deal with this? Messing with Guest permissions, banning the Guest IPs, etc?
User avatar
Brf
Support Team Member
Support Team Member
Posts: 53447
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by Brf »

You can always prevent Guest searches and block them from viewing certain forums, especially ones with a lot of pictures.
jamesperrin
Registered User
Posts: 1
Joined: Wed Apr 14, 2021 10:46 am

Re: Question on using htaccess vs claudebot and Safari/537.36

Post by jamesperrin »

We've got 500 of these claudebots - our usual user & guest number is in the 10s! I've added them to the robots.txt and hope that's the last of it.

Return to “[3.3.x] Support Forum”