Recovering a phpbb install using WayBack Machine

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Get Involved
Post Reply
User avatar
Theonardo
Registered User
Posts: 128
Joined: Sat May 23, 2015 5:37 am

Recovering a phpbb install using WayBack Machine

Post by Theonardo »

A very helpful phpbb forum that I was an active member of tragically expired several years ago due to the founder's death. The domain has since been snapped up by some reseller, but the core content of the site (the phpbb install) has completely disappeared.

I recently discovered that I can still get bits of info from the old site using the WayBack Machine, and that got me thinking to recreate the site as sort of a tribute to the founder and legacy of the community, many of whom I am sure would find their way back to a re-launch. Aside from copy each and every post manually, is there a tool to scrape the remnants of the site and replicate the original phpbb build?
User avatar
EA117
Registered User
Posts: 1765
Joined: Wed Aug 15, 2018 3:23 am
Contact:

Re: Recovering a phpbb install using WayBack Machine

Post by EA117 »

It's not a a great option, but it might at least let you grab some kind of snapshot of the information, before you lose access even to the archive.org copy.

One tool which does what you're asking about it WGET, which allows you to iterate over an entire site using HTTP and download all the HTML documents and other linked resources.

What that gives you is something essentially like a Google bot crawling every link on your site: It will download the HTML page that is generated for each thread and page, as WGET recursively walks every link within your site.

"As HTML" isn't going to be the best or most fun to have to re-parse through later (manually, or with some scripting or macro prowess) to try and make new similar phpBB posts on a new site, if that's your intention. But it's at least "something", and would be "complete" in terms of knowledge on the site, and doesn't require any more access to create beyond "an ability to view the web site." You would have a complete offline HTML copy of whatever you were able to view through archive.org.

There is a GUI front-end for WGET available from https://sourceforge.net/projects/winwget/ . That's probably better for people like me who are unfamiliar with WGET, rather than trying to download and use the Win32 port of the GNU original. But even with WINWGET, I would expect to spend a lot of time testing and getting download settings correct before you're finally able to "unleash" WGET to go ahead and walk the entire site.

Don't forget archive.org may throttle you or even cut you off after a certain amount of bandwidth, so perhaps include a crawling delay so that you're not being too greedy about the archive.org site processing and bandwidth.
User avatar
AmigoJack
Registered User
Posts: 5757
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Recovering a phpbb install using WayBack Machine

Post by AmigoJack »

No, this won't work well. If you keep an eye on the actual URIs in your address bar when using Wayback you'll see it can differ drastically between each page, because each page might have been scraped at a different date. Recursively downloading parts of https://archive.org may end up having several copies of the same page, and maybe even dead lock itself. And: you also only have pages with i.e. multiple posts or a topic/forum list in it, where you still need to cut out the relevant parts yourself in case you want to put it into a running installation again.

Of course, the last part can be automated - you "only" need to find a programmer who wants to do that job. And you need to accept that he can't perform magic tricks: what isn't printed on those pages can't be found, like profile details of each members. But going thru each saved HTML page and collecting each relevant data is possible. I've done and do similar things, and given on how far you want to go into details it can take much time and testing before being happy with the results.

WGET itself is a great free tool, but even mirroring a living board already comes with its caveats - whenever I want to mirror a website I use a very old version of https://metaproducts.com/products/offline-explorer because I can easily define exceptions (i.e. not wanting to download every "quote this post" version of a page/post or every "send per email" version of a page/topic...) and it performs with multiple threads (which means i.e. 10 downloads in parallel).
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.
User avatar
Lumpy Burgertushie
Registered User
Posts: 67986
Joined: Mon May 02, 2005 3:11 am
Contact:

Re: Recovering a phpbb install using WayBack Machine

Post by Lumpy Burgertushie »

is the site on the wayback machine actually the html pages or is it snapshots of each page?

if it is not the actual html/css for the page then you are not really going to be able to do this easily at all.
even if it is html all of the pages are not usually there to start with.

luck,
robert
I'm baaaaaccckkkk. still doing work on donation basis. PM your needs.

Premium phpBB 3.3 Styles by PlanetStyles.net

If nobody is in the forest, does a tree really fall?
User avatar
Toxyy
Registered User
Posts: 773
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

Re: Recovering a phpbb install using WayBack Machine

Post by Toxyy »

It's html but just because you see the topiclist doesn't mean the topics themselves are archived, and they're all done at different dates. At best if something was written you'd only have a fraction of the content that was there, depending on the size of the site and how often it was crawled.
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][BETA] Sticky Ad || [3.2][RC] Show User Activity ||
[3.2][DEV] User Delete Topics
Post Reply

Return to “phpBB Discussion”