Duplicate Postings

Get help with installation and running phpBB 3.0.x here. Please do not post bug reports, feature requests, or MOD-related questions here.
Get Involved
Forum rules
END OF SUPPORT: 1 January 2017 (announcement)
Locked
Scotty501
Registered User
Posts: 89
Joined: Wed Mar 28, 2007 9:40 am
Location: London, UK
Contact:

Duplicate Postings

Post by Scotty501 » Thu Sep 09, 2010 9:21 am

Hi All,

I have been using a MOD called Last RSS (in Development) which pulls in RSS feeds and posts them on to your board - works very well. Its uses time stamp etc to avoid duplicating content - however some RSS feeds do not use this method, so I end up with duplicate posts.

The issue is not the MOD hence posting in here - does anyone know if there is a MOD to remove duplicate posts? Its not the same as Double Post MOD - which I believe deals with the hitting the "Reply" button twice. I am really looking for a board wide duplicate post deletion tool.

Such a thing?
Web Design http://www.ukushosting.com Web Hosting

User avatar
AmigoJack
Registered User
Posts: 5588
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Duplicate Postings

Post by AmigoJack » Thu Sep 09, 2010 9:59 am

Scotty501 wrote:a board wide duplicate post deletion
Well, where does your definition of duplicate start and where does it end? A post has dozens of attributes:
  • Text
  • Subject
  • Time
  • Author
  • Editor
  • ...
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

Scotty501
Registered User
Posts: 89
Joined: Wed Mar 28, 2007 9:40 am
Location: London, UK
Contact:

Re: Duplicate Postings

Post by Scotty501 » Thu Sep 09, 2010 10:26 am

Well I guess content (text) of the post is the ultimate duplication checker, but I fear if there was a process to do this, the search process would be huge.

So as a startING point - matching the title of the topic (not sub topic titles) would be a good start - but text would be the ulitmate checker for sure.
Web Design http://www.ukushosting.com Web Hosting

User avatar
AmigoJack
Registered User
Posts: 5588
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Duplicate Postings

Post by AmigoJack » Thu Sep 09, 2010 11:11 am

I could give you an SQL command which would find all topic titles occuring in that form twice or more times. But then details continue:
  • which of those should be deleted?
  • Should all but one be deleted?
  • Is case sensitivity important?
  • Think of similar looking titles which would not qualify as duplicates, like "movie-premiere" versus "movie premiere".
  • Think of misspellings which would not qualify as duplicates, like like "Inglorious Bastards" versus "Inglourious Basterds".
Also: once the command is precisely finalized it would erase duplicates once. To make it work like for preventing duplicates a whole MOD must be built...

Just make sure that this is not easy or simple what you're requesting. The problem itself might be trivial. But not the solution. Is this the right solution anyway?
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

Scotty501
Registered User
Posts: 89
Joined: Wed Mar 28, 2007 9:40 am
Location: London, UK
Contact:

Re: Duplicate Postings

Post by Scotty501 » Thu Sep 09, 2010 12:02 pm

Hi Amigo Jack,

Good inputs and for sure I accept it is not easy. Ideally it matches like for like and deletes all but one

So topic title

"How do I paint my face" versus "How do I paint my face" would delete all but one of the threads. As this is coming from a RSS feed - it will always be identical.

If it was

"How do I paint my face" versus "HOW do I paint my face" - I would accept that it is more difficult - but that is not really what I am trying to acheive as the posts I am concerned with are always identical in terms of caps lock, spelling and format.

Make sense?

EDIT - Also it is always the same poster as I have created a bot which pulls in the feeds - if that make any difference.
Web Design http://www.ukushosting.com Web Hosting

User avatar
AmigoJack
Registered User
Posts: 5588
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Duplicate Postings

Post by AmigoJack » Thu Sep 09, 2010 1:38 pm

To find topic titles which occur more than once, execute this SQL command:

Code: Select all

  SELECT t.topic_id, t.topic_title, count( t.topic_title ) 
    FROM phpbb_topics t 
   WHERE t.topic_poster= _your_bot_user_ID_
   GROUP BY t.topic_title 
  HAVING count( t.topic_title )> 1;
Replace _your_bot_user_ID_ as this filter helps the database to scan as little data as needed. It will give you all topic IDs, so you can look them up in your board and delete them. Deleting topics in the database alone is not enough - you need to delete them with phpBB to no corrupt dependent data and countings.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

Scotty501
Registered User
Posts: 89
Joined: Wed Mar 28, 2007 9:40 am
Location: London, UK
Contact:

Re: Duplicate Postings

Post by Scotty501 » Thu Sep 09, 2010 2:27 pm

Thanks mate - I can scan down the thread and delete them manually to be honest - whilst I appreciate the efforts - its more of an automated process I was looking for. Although the SQL will make sure I dont miss any.

I guess its just not out there at the moment - mores the pity.
Web Design http://www.ukushosting.com Web Hosting

Locked

Return to “[3.0.x] Support Forum”