Firstly, thank you immensely for sharing this script. I haven't run it yet, but I've been desperately looking for something like this. I haven't run it yet, as my forum is quite large with a 2 million post database and a lot
of externally hosted images.
In the meantime, I have been thinking about how to recover other images that are no longer available on the Internet. For example, the Photobucket experience was not the first time this has happened. Imageshack was a very large and popular free hosting service that got taken over and decided to delete everyone's images without any notice whatsoever. I have probably tens of thousands of dead links to imageshack images on my board.
Previously, for very important topics on my board, I have looked up archived pages on the Wayback Machine (archive.org) to find whether the images were cached there. Very often they are, which has enabled me to edit old posts to point to the URL of the cached archive.org image. Voila - topic saved!
This is obviously a ridiculously laborious task to do manually so I did some more digging.
I found that archive.org has a number of APIs including a JSON API for querying whether a particular URL is available. If so, archive.org will then return a URL to the cached version. Now if this query was run for the URL to an image in a phpBB forum post, then the process of finding archived images and downloading the cached version from archive.org can be automated.
Eg; If you query http://imageshack.com/img001.jpg
, archive.org will return http://web.archive.org/web/201309190446 ... img001.jpg
Details of the APIs are here: https://archive.org/help/wayback_api.php
Perhaps this could be done as a separate script or incorporated into the main one? For example if the script did not find the image hosted on the Internet, the archive.org query could be run to see whether it could downloaded from the archive.org URL instead.
I would love to do this, but I am a humble forum administrator with very little coding experience. But if someone was to do this, I think it would help resurrect important information on countless boards around the world.