[solved] grabbing links from a phpBB3 page

Discussion forum for MOD Writers regarding MOD Development.
Locked
KFCSpike
Registered User
Posts: 26
Joined: Wed May 18, 2005 11:27 am

[solved] grabbing links from a phpBB3 page

Post by KFCSpike » Tue Sep 25, 2007 6:57 pm

The phpBB3 project I am working on needs me to load a page <html generated by phpBB3>, then get the links from that page.

I use this

Code: Select all

// find all the urls
 preg_match_all("|href\=\"?'?`?([[:alnum:]:?=&@/;._-]+)\"?'?`?|i", $html, &$matches);
And it works a treat except for the post links that phpBB3 generates where there is a '#' in there?

I have tried putting the '#' into many places in that line of code but it didn't generate proper URLs.
I'm no expert with the preg_match stuff so would be grateful for any help
Last edited by KFCSpike on Fri Sep 28, 2007 6:06 pm, edited 1 time in total.

User avatar
A_Jelly_Doughnut
Former Team Member
Posts: 34457
Joined: Sat Jan 18, 2003 1:26 am
Location: Where the Rivers Run
Contact:

Re: grabbing links from a phpBB3 page

Post by A_Jelly_Doughnut » Tue Sep 25, 2007 9:26 pm

Yay for page scraping :)

I didn't test this, but the regex section of my brain suggests

Code: Select all

 preg_match_all("|href\=\"?'?`?([[:alnum:]:?=&@/;._-#]+)\"?'?`?|i", $html, &$matches);
would do it.
A Donut's Blog
"Bach's Prelude (Cello Suite No. 1) is driving Indiana country roads in Autumn" - Ann Kish

KFCSpike
Registered User
Posts: 26
Joined: Wed May 18, 2005 11:27 am

Re: grabbing links from a phpBB3 page

Post by KFCSpike » Wed Sep 26, 2007 4:43 pm

Thanks AJD, you nearly had it spot on and your reply helped.

BUT - It looks like the last dash needs to be at the end or preg_match_all thinks its looking for a range between the underscore and the hash ( _-# ) and reports that this is an invalid range.

Changing to

Code: Select all

preg_match_all("|href\=\"?'?`?([[:alnum:]:?=&@/;._#-]+)\"?'?`?|i", $html, &$matches);
did the trick so thanks for pointing me in the right direction. :D

I actually had the preg_match_all working correctly last night (luck, not skill!) but my understanding of what the link should look like and code later in my script were wrong :oops:

Locked

Return to “[3.0.x] MOD Writers Discussion”