Removing orphaned atachments not in DB.

Need some custom code changes to the phpBB core simple enough that you feel doesn't require an extension? Then post your request here so that community members can provide some assistance.

NOTE: NO OFFICIAL SUPPORT IS PROVIDED IN THIS SUB-FORUM
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

NOTE: NO OFFICIAL SUPPORT IS PROVIDED IN THIS SUB-FORUM
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 6423
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Removing orphaned atachments not in DB.

Post by thecoalman »

Trying to clean up attachments in my 20 year old forum, there is 200K files including the thumbs This script is checking to see if the file on disk is in the database.This works and I'm getting some results but is there a faster way? I'm running this on my local machine and it's going for about an hour now. :shock:

I only need to run it once so it makes no difference one way or the other but I'm just curious if there is anyway to improve it.

Code: Select all

<?php
error_reporting(E_ALL);
ini_set("display_errors", 1);
set_time_limit(20000);

//DB credentials
$dbhost = '';
$dbname = '';
$dbuser = '';
$dbpasswd = '';

$link = mysqli_connect($dbhost, $dbuser, $dbpasswd, $dbname);

// Set the path for phpbb files folder
$files_root = $_SERVER['DOCUMENT_ROOT'] . '/forum/files';
//Uncomment to test directory
//echo $files_root;
//exit;



if ($handle = opendir($files_root))
{

	while (false !== ($entry = readdir($handle))) 
	{
		if ($entry != '.' && $entry != '..' && $entry != 'index.htm' && $entry != '.htaccess')			
		{
			//Check if thumb itself is orphan
			$strip_thumb = str_replace('thumb_', '', $entry);
			$sql = "SELECT * FROM phpbb_attachments WHERE physical_filename = '" . $strip_thumb ."'";

			$result = mysqli_query($link, $sql);
			
			if (mysqli_num_rows($result) == 0) 
			{
			// testing right now
			//unlink($files_root . '/' . $entry);
			echo $entry . '<br>';
			}
			mysqli_free_result($result);
		}
	}
	closedir($handle);
}
?>
Done!
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
Anișor
Translator
Posts: 338
Joined: Tue Jan 08, 2013 9:36 pm
Location: Arbroath, Angus, Scotland

Re: Removing orphaned atachments not in DB.

Post by Anișor »

I remember I had to insert a lot of stuff in the database and the best option was Python. It did the job in a matter of seconds.
You can try that and see how it goes.
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 6423
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Re: Removing orphaned atachments not in DB.

Post by thecoalman »

I'm pretty sure the slowness is because of the directory functions. That part was copied directly from php manual.

https://www.php.net/manual/en/function.readdir.php

Still running....
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
Anișor
Translator
Posts: 338
Joined: Tue Jan 08, 2013 9:36 pm
Location: Arbroath, Angus, Scotland

Re: Removing orphaned atachments not in DB.

Post by Anișor »

Python is very good with managing files, directories. You should ask chatGPT for an alternative and see how it goes. I’m sure it will be a lot quicker than that.

Remember PHP is great for web but not very good for managing such tasks although it had builtin functions it’s not the best choice.
User avatar
NeoDev
Registered User
Posts: 25
Joined: Mon Oct 18, 2021 9:52 pm
Name: Neo

Re: Removing orphaned atachments not in DB.

Post by NeoDev »

Looping over 200k files and doing a query for each one means 200k queries. That's going to slow down everything a lot. Plus there is no index on the physical_filename column so that's going to require a full table scan for each query.

Worse case scenario: 200,000 queries x 200,000 rows = 40,000,000,000 rows to test

I would load all the files into an array with scandir() and load all physical_filenames into another array with select physical_filenames from phpbb_attachements. Then array_diff.
Come for the comics, stay for the bad fish puns.
— The Cross Time Cafe
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 6423
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Re: Removing orphaned atachments not in DB.

Post by thecoalman »

NeoDev wrote: Tue Sep 17, 2024 9:11 pm Plus there is no index on the physical_filename column so that's going to require a full table scan for each query.
I had considered getting the user ID from the physical_filename.
I would load all the files into an array with scandir()
I had considered what you suggested but based on what I read readdir should be faster.

I any event already done. I just put the local output of the 400+/- files into an array and ran it on live server.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison

Return to “phpBB Custom Coding”