Bot Or Site Scraper?

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Suggested Hosts
User avatar
CGI1984
Registered User
Posts: 203
Joined: Thu Feb 20, 2020 8:27 am

Re: Bot Or Site Scraper?

Post by CGI1984 »

I just noticed a new site guest, which looks like it could be a Facebook bot, but not 100% sure. This is what I saw for user info: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) - IP 173.252.87.42

If it is a Facebook bot then should I add it to my Spiders/Robots and would the User Agent be "facebot"?

Thanks.
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am

Re: Bot Or Site Scraper?

Post by KYPREO »

CGI1984 wrote: Mon Mar 02, 2020 10:08 am I just noticed a new site guest, which looks like it could be a Facebook bot, but not 100% sure. This is what I saw for user info: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) - IP 173.252.87.42

If it is a Facebook bot then should I add it to my Spiders/Robots and would the User Agent be "facebot"?

Thanks.
Facebook doesn't crawl sites but this user agent is sometimes classified as a bot. It is actually the user agent when a user accesses your site via the inbuilt browser in the Facebook app. I assume there was a link to your board somewhere on Facebook.

I wouldn't add it as a bot as it is essentially a real life guest visiting via Facebook.
phpBB user since 2002
www.AusRotary.com
User avatar
CGI1984
Registered User
Posts: 203
Joined: Thu Feb 20, 2020 8:27 am

Re: Bot Or Site Scraper?

Post by CGI1984 »

Thanks. I also had thought that it might have been someone following a link to our site from Facebook, but when I did a search for Facebook user agent then facebookexternalhit/* popped up on Stack Overflow on the link below and said that the Facebook User Agent is facebot:

https://stackoverflow.com/questions/862 ... user-agent
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am

Re: Bot Or Site Scraper?

Post by KYPREO »

Yes after reading that and doing some of my own reading, my previous post was wrong. It looks like Facebook "crawls" the site to gather the metadata necessary to display OpenGraph properties (preview image, topic name etc) when a link is posted on Facebook, see: https://developers.facebook.com/docs/sh ... rs/crawler

This is probably I have seen these hits coincide with my board being linked on Facebook and came to the conclusion I did.

I still think it's not in the same category as other bots. The only reason you want to add specific bots like this is control permissions and to manage sessions where the bot is opening lots of concurrent connections, like a search engine crawler. This Facebook user-agent doesn't do that. It just hits pages sporadically when a page is linked on Facebook. If you don't define it as a bot, it will be treated as a guest, and that is probably appropriate in the circumstances.
phpBB user since 2002
www.AusRotary.com
User avatar
CGI1984
Registered User
Posts: 203
Joined: Thu Feb 20, 2020 8:27 am

Re: Bot Or Site Scraper?

Post by CGI1984 »

Thanks again. The only other reason I see for adding it as a bot is it would make it faster for you to recognize who is online when it visits the site. I realize it isn’t a bot in the conventional sense, and only visits to checkup on a link someone added for your board to FaceBook, but when it’s on the site it is acting as a bot and collecting meta data. Since it is going to be doing the same task every time it’s on the site it will help give you an idea (at quick glance of who is online) how often FaceBook is pulling data from your site. So not a bad idea to add it to the bots I think and it’s not there acting like a normal human guest who is normally there reading posts out of personal interest.
User avatar
CGI1984
Registered User
Posts: 203
Joined: Thu Feb 20, 2020 8:27 am

Re: Bot Or Site Scraper?

Post by CGI1984 »

Just saw this DuckDuckGo Favicons bot on my site. Add?

https://developers.whatismybrowser.com/ ... icons-bot/
User avatar
Lumpy Burgertushie
Registered User
Posts: 69228
Joined: Mon May 02, 2005 3:11 am

Re: Bot Or Site Scraper?

Post by Lumpy Burgertushie »

as was said above, the only reason for the bot group is so you can control permissions for any and all bots in that group.

you will get hundreds of bots visiting your site, there is no reason to put them all in the bot group.

most people never touch that group and it works just fine for them.

luck,
robert
Premium phpBB 3.3 Styles by PlanetStyles.net

I am pleased to announce that I have completed the first item on my bucket list. I have the bucket.
User avatar
3Di
I've Been Banned!
Posts: 17538
Joined: Mon Apr 04, 2005 11:09 pm
Location: I'm with Ukraine 🇺🇦
Name: Marco

Re: Bot Or Site Scraper?

Post by 3Di »

KYPREO wrote: Mon Mar 02, 2020 10:37 am ...

Facebook doesn't crawl sites but this user agent is sometimes classified as a bot. It is actually the user agent when a user accesses your site via the inbuilt browser in the Facebook app. I assume there was a link to your board somewhere on Facebook.

I wouldn't add it as a bot as it is essentially a real life guest visiting via Facebook.
I added it because having 12 guests at the same time with the same UA/IP didn't seem very common, and I personally have no reference to facecrook on my site, so it's not explained otherwise.
🆓 Free support for our extensions also provided here: phpBB Studio
🚀 Looking for a specific feature or alternative option? We will rock you!
Please PM me only to request paid works. Thx. Buy me a coffee -> Image
My development's activity º PhpStorm's proud user º Extensions, Scripts, MOD porting, Update/Upgrades
johns1124
Registered User
Posts: 88
Joined: Fri Jan 27, 2006 1:19 pm
Location: LA,Calif.

Re: Bot Or Site Scraper?

Post by johns1124 »

For some reason over the last two weeks I have been hit by this:
phpbb facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

at the tune of over 150+ connections at one time. I'm starting to think its time to block it. via .htaccess since it used
multi-IP's

Anyone have more experience with this bot?
User avatar
bonelifer
Community Team Member
Community Team Member
Posts: 3640
Joined: Wed Oct 27, 2004 11:35 pm
Name: William

Re: Bot Or Site Scraper?

Post by bonelifer »

You could try contacting the email address on that page. But from a Google search, it seems like Facebooks bots don't respect crawl delay directive in robots.txt, so if they are hitting you that hard at times, and they don't reply, probably best to block them outright via .htaccess file.

Only thing it'll hurt is when people share your board on Facebook, it wont get the page title and images.
William Jacoby - Community Team
Knowledge Base | phpBB Board Rules | Search Customisation Database
Please don't contact me via PM or email for phpBB support .

phpBB Modders is looking for developers! If you have phpBB experience and want to join us, click here!

Return to “phpBB Discussion”