PHP Regex quick question

Discussion forum for Extension Writers regarding Extension Development.
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

WoW! ;)

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png';

Code: Select all

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);
// then instead to use foreach (but i guess foreach is faster):
echo'<pre>';
$res = array_column($matches, 0);
$res = array_map('strtolower',$res);
print_r($res);
exit;
result:

Code: Select all

Array
(
    [0] => https://www.google.com/test.jpg
    [1] => http://www.google.com/test.jpg
    [2] => https://pbs.twimg.com/media/ejgpyarwoaaugpo?format=jpg
    [3] => http://www.google.com/test.gif
    [4] => http://www.google.com/dodacom?pic=gif
    [5] => https://www.google.com/whatever.png
)
[EDITED]
Last edited by axe70 on Tue Oct 06, 2020 4:47 pm, edited 1 time in total.
Do not take me too serious
Anyway i do not like Discourse
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote: Tue Oct 06, 2020 4:34 pm WoW! ;)

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png';

Code: Select all

$s = preg_match_all('/(https?)(:\/\/)(.[^<][^ ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);
// then instead to use foreach (but i guess foreach is faster):
echo'<pre>';
$res = array_column($matches, 0);
$res = array_map('strtolower',$res);
print_r($res);
exit;
result:

Code: Select all

Array
(
    [0] => https://www.google.com/test.jpg
    [1] => http://www.google.com/test.jpg
    [2] => https://pbs.twimg.com/media/ejgpyarwoaaugpo?format=jpg
    [3] => http://www.google.com/test.gif
    [4] => http://www.google.com/dodacom?pic=gif
    [5] => https://www.google.com/whatever.png
)
Ok, tried using your code and had blank result so we tried with our other method using your new regex. How does this look??

Our string

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png';
The magic

Code: Select all

$pattern = '/(https?)(:\/\/)(.[^<][^ ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui';
   preg_match_all($pattern,$string,$images);
   
print_r($images);
The result (looks good)

Code: Select all

Array
(
    [0] => Array
        (
            [0] => https://www.google.com/test.jpg
            [1] => http://www.google.com/test.JPG
            [2] => https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg
            [3] => http://www.google.com/test.gif
            [4] => http://www.google.com/dodacom?pic=gif
            [5] => httpS://www.google.com/whatever.png
        )

    [1] => Array
        (
            [0] => https
            [1] => http
            [2] => https
            [3] => http
            [4] => http
            [5] => httpS
        )

    [2] => Array
        (
            [0] => ://
            [1] => ://
            [2] => ://
            [3] => ://
            [4] => ://
            [5] => ://
        )

    [3] => Array
        (
            [0] => www.google.com/test
            [1] => www.google.com/test
            [2] => pbs.twimg.com/media/EjgPYArWoAAUgPO?forma
            [3] => www.google.com/test
            [4] => www.google.com/dodacom?pic=
            [5] => www.google.com/whatever
        )

    [4] => Array
        (
            [0] => .
            [1] => .
            [2] => 
            [3] => .
            [4] => 
            [5] => .
        )

    [5] => Array
        (
            [0] => jpg
            [1] => JPG
            [2] => t=jpg
            [3] => gif
            [4] => gif
            [5] => png
        )

)
We see all of our image urls in $images[0][0-5] :D

Any way you can see this possibly failing if something in string were to be different. Anything that hasn't been accounted for here?
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

Again little shorter, so provide please a case string where it can fail with this:

Code: Select all

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);
echo'<pre>';
$res = array_column($matches, 0);
$res = array_map('strtolower',$res);
print_r($res);
exit;
Do not take me too serious
Anyway i do not like Discourse
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote: Tue Oct 06, 2020 4:51 pm Again little shorter, so provide please a case string where it can fail with this:

Code: Select all

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);
echo'<pre>';
$res = array_column($matches, 0);
$res = array_map('strtolower',$res);
print_r($res);
exit;
Can you please explain the change and the reason for it?
Thank You!
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

Since these are fly tests i'm doing while doing something else (yeah bad practice!) may you can test and in case report if there are possibilities of fails, that's really possible!

p.s if you was meaning the reason of the change, is because [^<][^ ] as been switched to [^< ] that's the same. With this, the string need to not contain a space, nor a < char, or will not match.
The complete example look like this:

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png';

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);

$res = array_column($matches, 0);
$res = array_map('strtolower',$res);

echo'<pre>';
print_r($res);
exit;
Should (?) never fail
Do not take me too serious
Anyway i do not like Discourse
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote: Tue Oct 06, 2020 5:09 pm Since these are fly tests i'm doing while doing something else (yeah bad practice!) may you can test and in case report if there are possibilities of fails, that's really possible!

p.s if you was meaning the reason of the change, is because [^<][^ ] as been switched to [^< ] that's the same. With this, the string need to not contain a space, nor a < char, or will not match.
The complete example look like this:

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=jpg&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png';

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(jpg|jpeg|gif|png|[a-z]=jpg|jpeg|gif|png){1}/ui', $string, $matches, PREG_SET_ORDER);

$res = array_column($matches, 0);
$res = array_map('strtolower',$res);

echo'<pre>';
print_r($res);
exit;
Should (?) never fail
Seems to be good here. Will let you know if I come across anything strange in the future. Thank You!!!! Credit given in source as before!!
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

Seems to be good here. Will let you know if I come across anything strange in the future.
Yes please! let know, since we are hot now, should be easy to add more requirements to these regexp that can be improved to match more kind of strings all together. Training ourselves with regexp always is a good idea
It can improve not only the regexp, but our skills

KISS - keep it simply stupid
Do not take me too serious
Anyway i do not like Discourse
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote: Wed Oct 07, 2020 8:27 am
Seems to be good here. Will let you know if I come across anything strange in the future.
Yes please! let know, since we are hot now, should be easy to add more requirements to these regexp that can be improved to match more kind of strings all together. Training ourselves with regexp always is a good idea
It can improve not only the regexp, but our skills

KISS - keep it simply stupid
So far in our testing have not noticed any abnormalities

Thanks again
User avatar
ViolaF
I've Been Banned!
Posts: 1609
Joined: Tue Aug 14, 2012 11:52 pm

Re: PHP Regex quick question

Post by ViolaF »

Thanks also 2 axe70,

gave me a big input :ugeek: :geek: :D
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

Found an example of a URL incorrectly

Code: Select all

http://www.thexxx.xxx/xxxx/203012131/XXXX/444444444/-1/xxxxxxxxxxxx4444%3FTitle%3Dxxx-xxxxxxy-xxf-xxxxxl-xxxx-xxxxx&xx=xx&xx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxePng
is detected because of ePng which is incorrect
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

JLA wrote: Thu Oct 08, 2020 8:58 pm Found an example of a URL incorrectly

Code: Select all

http://www.thexxx.xxx/xxxx/203012131/XXXX/444444444/-1/xxxxxxxxxxxx4444%3FTitle%3Dxxx-xxxxxxy-xxf-xxxxxl-xxxx-xxxxx&xx=xx&xx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxePng
is detected because of ePng which is incorrect
Looking harder, I can see how with the regex this actually is correct because it looked for one of the image qualifiers (png) after an = in the query string. We were expecting the png to immediately follow the = and not further out. Think will leave the regex this way because there could be possible image urls that we would want to find in this format. Just have to validate a different way further on.

Thanks
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

:shock:

Code: Select all

http://www.thexxx.xxx/xxxx/203012131/XXXX/444444444/-1/xxxxxxxxxxxx4444%3FTitle%3Dxxx-xxxxxxy-xxf-xxxxxl-xxxx-xxxxx&xx=xx&xx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxePng
but there is not .png nor others notations we were searching for before.
What result should we return from this string?
May i have not understand the point? (as ever)
Do not take me too serious
Anyway i do not like Discourse
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote: Thu Oct 08, 2020 10:10 pm :shock:

Code: Select all

http://www.thexxx.xxx/xxxx/203012131/XXXX/444444444/-1/xxxxxxxxxxxx4444%3FTitle%3Dxxx-xxxxxxy-xxf-xxxxxl-xxxx-xxxxx&xx=xx&xx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxePng
but there is not .png nor others notations we were searching for before.
What result should we return from this string?
May i have not understand the point? (as ever)
The regex matches this URL This appears to be correct because I assume it is doing it correctly since there is a Png after the = sign in the query string of the URL.

Do you see the same result?
User avatar
JLA
Registered User
Posts: 606
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

Ok, found another URL style that was detected that appears to be incorrect

Code: Select all

https://www.xxxxxxxxxx.com/xxxxxx/gifx-xxxx/xxxxxx/x-xxxr-xxxx-gifx-xxx-xxxxxx-xxxxxx-xxxxx-x-xx-xxxx-xxx-xx
Regex finds

Code: Select all

https://www.xxxxxxxxxx.com/xxxxxx/gif
incorrect because in URL is not .gif and in query string (there is no query string in this url style) this is not =gif
User avatar
axe70
Registered User
Posts: 752
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Name: Alessio
Contact:

Re: PHP Regex quick question

Post by axe70 »

Do not know if i have over complicate things, anyway, it seem that for a string like this, that contain also your last two examples:

Code: Select all

$string = ' blah blah blah ANYTHINGCANBEHEREhttps://www.google.com/test.jpgANYTHINGCANBEHERE 
https://www.google.com/test.php http://www.google.com/test.JPG ANYTHING<https://pbs.twimg.com/media/EjgPYArWoAAUgPO?format=gif&name=900x900>http://www.google.com/test.gifhttp://www.google.com/dodacom?pic=gif&stuff=lalala  httpS://www.google.com/whatever.png
https://www.xxxxxxxxxx.com/xxxxxx/pngx-xxxx/xxxxxx/x-xxxr-xxxx-gifx-xxx-xxxxxx-xxxxxx-xxxxx-x-xx-xxxx-xxx-xxpng http://www.google.com/dodacom?pic=jpg&stuff=lalala
fehfiwhfeiw120270 blabla http://www.thexxx.xxx/xxxx/203012131/XXXX/444444444/-1/xxxxxxxxxxxx4444%3FTitle%3Dxxx-xxxxxxy-xxf-xxxxxl-xxxx-xxxxx&xx=xx&xx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxePng';
with this:

Code: Select all

$s = preg_match_all('/(https?)(:\/\/)(.[^< ]*?)(\.?|\?){1}(\.jpg|\.jpeg|\.gif|\.png|[a-z][\.|=]jpg|[\.|=]jpeg|[\.|=]gif|[\.|=]png){1}/ui', $string, $matches, PREG_SET_ORDER);
echo'<pre>';
$res = array_column($matches, 0);
$res = array_map('strtolower',$res);
print_r($res);
exit;
the result will be this:

Code: Select all

Array
(
    [0] => https://www.google.com/test.jpg
    [1] => http://www.google.com/test.jpg
    [2] => https://pbs.twimg.com/media/ejgpyarwoaaugpo?format=gif
    [3] => http://www.google.com/test.gif
    [4] => http://www.google.com/dodacom?pic=gif
    [5] => https://www.google.com/whatever.png
    [6] => http://www.google.com/dodacom?pic=jpg
)
anyway the regexp could be shorter, i'm sure. Let know if still fail for some other possible kind of strange string.
Anybody knows how the above should be reduced, maintaining the result?

[REGEXP EDITED]
Do not take me too serious
Anyway i do not like Discourse
Post Reply

Return to “Extension Writers Discussion”