Distributed Web Crawler - 80legs

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Suggested Hosts
Rhet-or-Ric
Registered User
Posts: 306
Joined: Sun Apr 06, 2008 1:38 pm

Re: Distributed Web Crawler - 80legs

Post by Rhet-or-Ric »

.

I wonder if, when time allows, one of the Team members could give us the straight poop on just what code should be used in .htaccess if one of us wished to use .htaccess to block a specific User Agent?

Or should I be starting another topic for this?

Thank you.

.
Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: Distributed Web Crawler - 80legs

Post by Pony99CA »

shiondev wrote:Blocking us by IP address is not a workable solution. We effectively have an infinite number of IP addresses, since they constantly rotate, while old ones go away and new ones come online.
Infinite, eh? There are at most 2**32 IP addresses (or somewhat over 4 billion). And if that were close to infinite, people wouldn't be talking about running out of IP addresses and the need for IP v6.

On top of that, I doubt you'll get even a large fraction of those IP addresses in your network. I'm not saying that blocking IP addresses is the best solution (and I presented two others), but I don't think it's as hopeless as claimed.

However, the other solutions I suggested (browser agent blocking and phpBB permissions) are probably better and easier.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.
User avatar
Phil
Former Team Member
Posts: 10403
Joined: Sat Nov 25, 2006 4:11 am
Name: Phil Crumm
Contact:

Re: Distributed Web Crawler - 80legs

Post by Phil »

Rhet-or-Ric wrote:.

I wonder if, when time allows, one of the Team members could give us the straight poop on just what code should be used in .htaccess if one of us wished to use .htaccess to block a specific User Agent?

Or should I be starting another topic for this?

Thank you.

.
Given it's not necessarily phpBB related and instead applies to Apache as a whole, you'd probably be better off finding/asking on a more general forum related to Apache.
Moving on, with the wind. | My Corner of the Web
User avatar
3Di
Former Team Member
Posts: 16151
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milano 🇮🇹 Frankfurt 🇩🇪
Name: Marco
Contact:

Re: Distributed Web Crawler - 80legs

Post by 3Di »

Rhet-or-Ric wrote:.

I wonder if, when time allows, one of the Team members could give us the straight poop on just what code should be used in .htaccess if one of us wished to use .htaccess to block a specific User Agent?

Or should I be starting another topic for this?

Thank you.

.
http://blamcast.net/articles/block-bots ... p-htaccess
🆓 Free support for our extensions also provided here: phpBB Studio
🚀 Looking for a specific feature or alternative option? We will rock you!
Please PM me only to request paid works. Thx. Want to compensate me for my interest? Donate
My development's activity º PhpStorm's proud user º Extensions, Scripts, MOD porting, Update/Upgrades
User avatar
haggisv
Registered User
Posts: 261
Joined: Wed Dec 20, 2006 3:31 am
Location: Adelaide, Australia
Contact:

Re: Distributed Web Crawler - 80legs

Post by haggisv »

I had well over 100 of these crawlers swamp my forum... I suspect they were also the ones that brought my site down recently.

I added them to the bot list (used the url as the string to identify them), then banned them, which seemed to work. Is that the best way to do it?

I've also contacted 80legs and told them the issue, as they suggested on their site if you have problems with their crawlers. Will be interested to see if I get a response...
User avatar
haggisv
Registered User
Posts: 261
Joined: Wed Dec 20, 2006 3:31 am
Location: Adelaide, Australia
Contact:

Re: Distributed Web Crawler - 80legs

Post by haggisv »

I received a response from 80legs, and they told me they would take my URL off the list to crawl! Of course I had banned them already, :twisted: but it shows that they reputable IMO.
rampp
Registered User
Posts: 1
Joined: Wed Mar 02, 2011 12:43 am

Re: Distributed Web Crawler - 80legs

Post by rampp »

Reputable ? The way they conduct their crawling, it is more similar to an botnet / DDOS.

The snippet below works fine for nginx, seeing in the last 45 minutes I've received 6500 different 80legs hits on a single site, disrupting the normal traffic it became apparent this is not a legit operation.

Code: Select all

if ($http_user_agent ~ "80legs" ) {

rewrite ^.+ http://www.80legs.com;

 }
User avatar
haggisv
Registered User
Posts: 261
Joined: Wed Dec 20, 2006 3:31 am
Location: Adelaide, Australia
Contact:

Re: Distributed Web Crawler - 80legs

Post by haggisv »

Yes I agree. I mean reputable in that they at least identify the bots by it's proper name, and respond to requests if you ask them to stop crawling your site. I'm sure there are many others out there that won't do either. :o
Post Reply

Return to “phpBB Discussion”