Spiders & bots to add to phpBB

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Ideas Centre
User avatar
millipede
Registered User
Posts: 179
Joined: Mon Feb 25, 2008 5:13 am
Contact:

Re: Spiders & bots to add to phpbb3

Post by millipede »

I saw ezooms on my forum today...
Guest IP: 208.115.111.74
Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
thought it was weird they have a google email address.
Tried looking them up on google but didn't have a ton of luck.
Checked the ip, led to wowrack dot com which is a web host.
I emailed them with the info and asked if they knew what it was...
got a reply...
This IP is owned by one of our client, dotnetdotcom.org

, and we have forwarded this ticket to them.
Their main purpose for this machine is to crawl/index the content just like google bot.
so, dotnetdotcom has more than one bot? And their website doesn't make any mention of this ezooms bot.
how long has dotnetdotcom been around?? and have they done anything useful with all the info they've collected so far?
I just don't understand some of these robots that say they're doing something useful... but I never see and use to their site myself...
maybe it's the government spying? :lol:
User avatar
AmigoJack
Registered User
Posts: 5726
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Spiders & bots to add to phpbb3

Post by AmigoJack »

New bots encountered. The 5th entry (Bixo Labs) is an updated agent match to an already existing entry. Entries 6 to 8 are rather program libraries / kits which can be used by any programmer rather than a real bot / spider / crawler.
  1. Bot name: Huawei Symantec
    Agent match: Huaweisymantecspider
  2. Bot name: Qirina Hurdler
    Agent match: +http://www.qirina.com/hurdler.html
  3. Bot name: AboutUs
    Agent match: +http://www.AboutUs.org
  4. Bot name: embed.ly
    Agent match: Embedly/
  5. (update) Bot name: Bixo Labs
    Agent match: +http://bixolabs.com
  6. Bot name: Ruby
    Agent match: Ruby/
  7. Bot name: curl
    Agent match: curl/
  8. Bot name: Async Http Client
    Agent match: AsyncHttpClient
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.
User avatar
TheSnake
Registered User
Posts: 483
Joined: Wed Aug 09, 2006 10:36 pm
Location: Staffordshire, England, UK
Contact:

Re: Spiders & bots to add to phpbb3

Post by TheSnake »

I like the idea of this list of extra Bots, just haven't decided which ones I'll add to my board yet. What I was wondering, where do I find out what bot's have been visiting my site, beside the already included 51 bots in 3.0.8 (ACP & the who is online list)?
User avatar
3Di
Former Team Member
Posts: 15589
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milan (IT) Frankfurt (DE)
Name: Marco
Contact:

Re: Spiders & bots to add to phpbb3

Post by 3Di »

TheSnake wrote:I like the idea of this list of extra Bots, just haven't decided which ones I'll add to my board yet. What I was wondering, where do I find out what bot's have been visiting my site, beside the already included 51 bots in 3.0.8 (ACP & the who is online list)?
that's why this topic exists, the script will add the BOTs not included into the vanilla install. then you will see.
Please PM me only to request paid works. Thx.
Want to compensate me for my interest? Donate
My development's activity º PhpStorm's proud user
Extensions, Scripts, MOD porting, Update/Upgrades
:studio_microphone: Looking for a specific feature or alternative option?
User avatar
TheSnake
Registered User
Posts: 483
Joined: Wed Aug 09, 2006 10:36 pm
Location: Staffordshire, England, UK
Contact:

Re: Spiders & bots to add to phpbb3

Post by TheSnake »

3Di wrote:
TheSnake wrote:I like the idea of this list of extra Bots, just haven't decided which ones I'll add to my board yet. What I was wondering, where do I find out what bot's have been visiting my site, beside the already included 51 bots in 3.0.8 (ACP & the who is online list)?
that's why this topic exists, the script will add the BOTs not included into the vanilla install. then you will see.
I've made a copy of the script, to use when necessary.

What I actually meant though, on my website, what do I look for to find out what bot's have visited the site?

Even though they are not added as part of the default install of bots in 3.0.8 I just found YandexImages in the Guest section of Who is online. Where (in phpBB, CPanel, etc) do I find the user agents listed for each visitor? What I'm after is the information on each user who's visited the site, whether they are a Bot or normal User/Guest. Also, what part of the info that displays for each user in the who is online is actually the user agent?
User avatar
AmigoJack
Registered User
Posts: 5726
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Spiders & bots to add to phpbb3

Post by AmigoJack »

TheSnake wrote:Where (in phpBB, CPanel, etc) do I find the user agents listed for each visitor? What I'm after is the information on each user who's visited the site, whether they are a Bot or normal User/Guest. Also, what part of the info that displays for each user in the who is online is actually the user agent?
Listing every user agent which ever visits would produce a lot of data to store, since they all differ a lot. I wrote a routine which lists me every user agent of a new session which is not a registered user and not a yet known bot. But I still have to manually analyze the list. Ask your hoster for your access.log which lists you every site request with every requester IP and user agent. That's already a massive list.

Everything in who is online is the user agent.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.
User avatar
TheSnake
Registered User
Posts: 483
Joined: Wed Aug 09, 2006 10:36 pm
Location: Staffordshire, England, UK
Contact:

Re: Spiders & bots to add to phpbb3

Post by TheSnake »

OK, after a bit of poking around my CPanel & looking at that link you provided, I was able to find the info I needed, it looks like I might have to use the Raw Access Logs. I had previously tried some of the other log files that is provided, but either it is only the top 15 user agents listed, or the last 300 hits, considering I've been spending alot of time on the site working on different bits, it's mosly my details (using different Computers, Browsers & ISP's).

I've looked at the Raw Access Logs & found Bing had been on my site yesterday:

Code: Select all

65.52.110.23 - - [13/Feb/2011:14:14:28 -0500] "GET /robots.txt HTTP/1.1" 200 71 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
What I was wondering, is there no other way of just accessing the User Agent details? Some form of script or something? That line with Bing is one of the shortest lines of information, & the file is HUGE!
User avatar
AmigoJack
Registered User
Posts: 5726
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Spiders & bots to add to phpbb3

Post by AmigoJack »

There are a couple of scripts for just analyzing the standard log files: AWStats, Webalizer, W3Perl... In fact your hoster might already be providing you one of these.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.
User avatar
TheSnake
Registered User
Posts: 483
Joined: Wed Aug 09, 2006 10:36 pm
Location: Staffordshire, England, UK
Contact:

Re: Spiders & bots to add to phpbb3

Post by TheSnake »

I've managed to find a few logs, but not exactly what I want.

Latest Visitors - Only gives me the last 300 visitors/page hits (most of which is my own browsers & computers, multiple ISP's).
Webalizer - lots of info, but only the top 15 user agents (1 of which is google), I've had loads more visit, but only GoogleBot is shown.
Awstats does lists some of the known bots that have visited (not shown in phpbb log), however some in phpbb logs don't show in Awstats.
Raw Access Logs looks like my best bet, it shows all user agents, but the file is massive :cry:
Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: Spiders & bots to add to phpbb3

Post by Pony99CA »

Catching bots (both good and bad) is one reason that I've suggested that phpBB implement a guest log. The current Who Is Online dosn't show them for very long. I've updated mine to capture 24 hours of data, but it refreshes after a minute and they're gone. (One of the staff showed me a fix for that, but it would still be nice if we had a real log.)

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.
User avatar
TheSnake
Registered User
Posts: 483
Joined: Wed Aug 09, 2006 10:36 pm
Location: Staffordshire, England, UK
Contact:

Re: Spiders & bots to add to phpbb3

Post by TheSnake »

Since my last post, I've found out that the Raw Access Logs wasn't set to save to my CPanel Access Log folder, so only a small amount of traffic was available. The other logs provided some information, but mostly they had multiple comment entries like "Unknown bot XXX" (XXX being different assorted characters to each reference), but no user agent details.

I can go back through my logs, Awstarts & Webalizer to over a year ago, but not many bots are actually listed, they are mostly not recognised. For instance: Baidu, MSNbot & a couple others have visited the forum frequently, but do not show up in Awstats or Webalizer. Is there any other way of gathering the details of what bot's (user agents) have previously visited?

I have changed the setting in my CPanel, so now at the end of every day the RAL will be saved to my home access logs inventry, but I haven't been able to find anything about accessing (if at all possible) any logs from previous.
User avatar
heredia21
Registered User
Posts: 942
Joined: Sun Apr 18, 2010 6:14 pm
Contact:

Re: Spiders & bots to add to phpbb3

Post by heredia21 »

I spotted a few bots not sure how to post them here:

IP: 50.16.239.113 » Whois
Mozilla/5.0 (compatible; Birubot/1.0) Gecko/2009032608 Firefox/3.0.8

IP: 89.151.116.52 » Whois
Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot

IP: 216.24.142.41 » Whois
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)

IP: 50.18.58.153 » Whois
UnwindFetchor/1.0 (+http://www.gnip.com/)

Would these be considered bots? What is their agent match?
Best BlackBerry website for all users! BlackBerry News - http://blackberryempire.com
Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: Spiders & bots to add to phpbb3

Post by Pony99CA »

heredia21 wrote:I spotted a few bots not sure how to post them here:

IP: 50.16.239.113 » Whois
Mozilla/5.0 (compatible; Birubot/1.0) Gecko/2009032608 Firefox/3.0.8

IP: 89.151.116.52 » Whois
Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot

IP: 216.24.142.41 » Whois
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)

IP: 50.18.58.153 » Whois
UnwindFetchor/1.0 (+http://www.gnip.com/)

Would these be considered bots? What is their agent match?
Yes, and you post them like this:
  1. Bot name: Birubot [Bot]
    Agent match: Birubot
  2. Bot name: Tweetmeme [Bot]
    Agent match: TweetmemeBot
  3. Bot name: OneRiot [Bot]
    Agent match: OneRiot
    URL: http://www.oneriot.com
  4. Bot name: Gnip [Bot]
    Agent match: UnwindFetchor
    URL: http://www.gnip.com
Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.
User avatar
heredia21
Registered User
Posts: 942
Joined: Sun Apr 18, 2010 6:14 pm
Contact:

Re: Spiders & bots to add to phpbb3

Post by heredia21 »

How do i know that the agent match matches the bot name?

Also I noticed the blog is not being updated with newb ots anymore?
Best BlackBerry website for all users! BlackBerry News - http://blackberryempire.com
User avatar
heredia21
Registered User
Posts: 942
Joined: Sun Apr 18, 2010 6:14 pm
Contact:

Re: Spiders & bots to add to phpbb3

Post by heredia21 »

Found a few more:

IP: 62.146.124.12 » Whois
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CrystalSemanticsBot http://www.crystalsemantics.com/service ... useragent/)


IP: 204.236.177.206 » Whois
Twisted PageGetter

IP: 209.85.224.91 » Whois
FeedBurner/1.0 (http://www.FeedBurner.com)

IP: 184.73.30.153 » Whois
Mozilla/5.0 (compatible; PaperLiBot/2.1)

IP: 72.14.179.62 » Whois
InAGist URL Resolver (http://inagist.com)

IP: 93.159.111.83 » Whois
Mozilla/5.0 (compatible; suggybot v0.01a, http://blog.suggy.com/was-ist-suggy/suggy-webcrawler/)

IP: 46.20.47.43 » Whois
Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot

IP: 128.242.249.11 » Whois
TwitterFeed 3
Last edited by heredia21 on Wed Feb 16, 2011 3:29 pm, edited 1 time in total.
Best BlackBerry website for all users! BlackBerry News - http://blackberryempire.com
Post Reply

Return to “phpBB Discussion”