Scrape topics from another site, post w/ correct U.names on

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
securitynut
Registered User
Posts: 43
Joined: Tue Jan 16, 2007 7:20 am

Scrape topics from another site, post w/ correct U.names on

Post by securitynut » Wed Jul 11, 2007 1:47 am

Hello,

I'm looking to make a scraper that will allow a user to enter a URL to a specific topic page on another forum. The scraper will then pull an array of usernames in that topic and the associated posts they made.

My question is, would it be possible to sort of post this array into my phpBB forum with the usernames and post content appearing correctly? Most of the usernames wont be registered on my site, however it is possible that some are. I was thinking it might be possible to somehow make it post each reply in guest mode, but don't know what happens if a guest posts with the same name as a registered user.

Any ideas? It wouldn't be too hard to stick all the replies and original post into one post by one username, but then individual posts aren't quotable etc...

User avatar
Phil
Former Team Member
Posts: 10403
Joined: Sat Nov 25, 2006 4:11 am
Name: Phil Crumm
Contact:

Re: Scrape topics from another site, post w/ correct U.names on

Post by Phil » Wed Jul 11, 2007 1:49 am

It's possible if you know how to write the program to do so, but actually doing so will likely break a large number of copyright laws.
Moving on, with the wind. | My Corner of the Web

securitynut
Registered User
Posts: 43
Joined: Tue Jan 16, 2007 7:20 am

Re: Scrape topics from another site, post w/ correct U.names on

Post by securitynut » Wed Jul 11, 2007 3:22 am

The legal side is not an issue.

Do you have any tips with regards to inserting the data into posts with correct usernames as I've mentioned above? I'm not having any trouble with scraping, just trying to get my head around inserting them as phpBB posts.

User avatar
drathbun
Former Team Member
Posts: 12204
Joined: Thu Jun 06, 2002 3:51 pm
Location: TOPICS_TABLE
Contact:

Re: Scrape topics from another site, post w/ correct U.names on

Post by drathbun » Wed Jul 11, 2007 3:49 am

securitynut wrote:The legal side is not an issue.
I don't see how you can say this. In my opinion, the legal side is very much an issue, especially if you don't have permission to take the content. If you have a legitimate right to the data, you should be able to use some other method to do this, like getting the information directly from a database link.

Can you provide a legitimate reason for doing such an activity? Do you have the permission from the target board(s) that you are trying to scrape?

These questions will probably appear to be a bit of a challenge, and they probably are. :-) I know that I would be extraordinarly upset to find out that someone was doing this to one of my boards, and I would take legal action against the person(s) involved.

There may be a legitimate reason; I'm open for responses. For example, google does a "scrape" of the content in order to index my board. But they don't attempt to store my content in their own database, potentially in a format (another board) that would compete with my own. So I don't have a quibble with them. But someone taking my content, and then making it appear as though my own users were posting the same content on another board would raise some concerns. And not just from me, but from my users as well I would imagine.
I blog about phpBB: phpBBDoctor blog
Still using phpbb2? So am I! Click below for details
Image

securitynut
Registered User
Posts: 43
Joined: Tue Jan 16, 2007 7:20 am

Re: Scrape topics from another site, post w/ correct U.names on

Post by securitynut » Wed Jul 11, 2007 4:15 am

My comment - "The legal side is not an issue" should have served to snub out any legal discussion. I've been through and gotten over the whole ethical scraping deal. With that said, my board is somewhat protected under another law and my intentions are not to gain anything from this. It is a user-run supplemental service. I have informal permission from a representative but would have complied with a takedown notice from the site owner - whose attention I have drawn to my website- had I tread on anyones toes in the 6 months my website has been running. May we get on topic now?

I am at phpBB.com to discuss insertion of posts from an array such that they are quotable and interact-able as are normal posts. Firstly, I don't have the faintest idea how to do this, and secondly there may be some usernames that are registered while others aren't. This concerns me because I am not sure how phpBB would handle that. Has anyone got some ideas? I can't think of a MOD that automatically inserts posts under unregistered names so the "chop and change" route which I normally take isn't available this time :?

User avatar
Marshalrusty
Project Manager
Project Manager
Posts: 29253
Joined: Mon Nov 22, 2004 10:45 pm
Location: New York City
Name: Yuriy Rusko
Contact:

Re: Scrape topics from another site, post w/ correct U.names on

Post by Marshalrusty » Wed Jul 11, 2007 5:06 am

Firstly, we take ethics very seriously. Since the owner of the site you want to "scrape" could easily provide you with the information should he/she want to, a tool such as this could only be malicious in nature. You may brush this issue off and say that you have "gotten over the ethics of scraping", but that doesn't mean we agree.

Aside from that, this board is for the support of the phpBB software, and what you are asking falls very far outside this scope.

As such, I am going to close this topic and ask that you simply ask the owner of the site for the information (if he/she really doesn't mind).
Have comments/praise/complaints/suggestions? Please feel free to PM me.

Need private help? Hire me for all your phpBB and web development needs

Locked

Return to “2.0.x Discussion”