PHP Regex quick question

Discussion forum for Extension Writers regarding Extension Development.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

PHP Regex quick question

Post by JLA »

Looking for a simple php regex that will do the following

In a string - will find all occurrences of <https://thiscanbeanyassortmentofanyrandomchars> and remove the < and > at the beginning of each match but not touch any other < or > that might appear in the string

Inside the < >, it must have https:// and after the https:// but before the > it can be any assortment of chars except >

Thanks in advance.
User avatar
david63
Registered User
Posts: 18821
Joined: Thu Dec 19, 2002 8:08 am
Location: Lancashire, UK
Contact:

Re: PHP Regex quick question

Post by david63 »

Have you looked at any of the Regex generators? - they normally work for me
David
Remember: You only know what you know and - you don't know what you don't know!
My CDB Contributions | How to install an extension
I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic.
No support requests via PM or email as they will be ignored
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

Code: Select all

$string = 'Today at <-thiscanbeanyassortmentofanyrandomchars-> <https://thiscanbeanyassortmentofanyrandomchars> it\'s raining but it is ok <b style="color:red">for me because</b> i\'m a <a href="https://www.google.com/search?q=fish&oq=snail">snail</a>';
echo $string . '<br /><br />';
$s = preg_replace('/<(ftp|http|https?):\/\/(.*?)\> /', '', $string);
echo $s;
exit;
(look result on page source)

P.s well, maybe the regexp need to be:
$s = preg_replace('/<(ftp|http|https?):\/\/(.*?)\> ?/', '', $string);
because if after <https://thiscanbeanyassortmentofanyrandomchars> there is no space, will not match. Think this should be more correct
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

Well, again, before to close my test on text editor:

Code: Select all

$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', ' ', $string);
this will replace white space before and later the string, if there are, and the entire string, with a single white space, that should be what you want.The regexp can be also more complex and precise
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote:
Sat Oct 03, 2020 9:02 am
Well, again, before to close my test on text editor:

Code: Select all

$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', ' ', $string);
this will replace white space before and later the string, if there are, and the entire string, with a single white space, that should be what you want.The regexp can be also more complex and precise
Thank you so much and I will give this a test today. To confirm – do you think this should result in the following?

Original $s

Code: Select all

$s = 'blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah>';
Then apply your preg_replace

Will result in this changed $s ???

Code: Select all


$s = 'blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah https://blahblah  https://blahblah  https://blah <blah>';

Notice in this example that the correctly matched < or> were replaced with a white space preserving the exact position in the string Hoping when I do the test later today to achieve the same result.

Thanks so much again for the help.
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

Well, no. I have read in bad way (sorry my Eng is Bat, like someone else say in this forum), the result of the below code, is that it replace completely the found string. What you ask is little more complicate, since you need to replace found instances with something that same instances contain, and changing their content.
So i imagined to accomplish to what you really ask for (i hope i understood this time), with something like this:

Code: Select all

$string = 'blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah>';

$s = preg_match_all('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', $string, $matches, PREG_SET_ORDER); 
$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', '##my007placeolder##', $string, -1 ,$cr);
$pn = 0;
if($cr > 0){
	foreach($matches as $m){
	 $s = preg_replace('/\#\#my007placeolder\#\#/', ' ' . $matches[$pn][1] . '://' . $matches[$pn][2] . ' ', $s, 1); // one x time, in order
	 //print_r($s);echo'<br />';
	 $pn++;
  }
}

echo $s;

Then the result will be:

Code: Select all

blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah https://blahblah  https://blahblah  https://blah <blah>
Anybody know how to improve this? Asking myself if there is some other better way ...
Particularly about regexp, it should be there another simpler, somewhere
Last edited by axe70 on Sat Oct 03, 2020 4:48 pm, edited 1 time in total.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote:
Sat Oct 03, 2020 4:27 pm
Well, no. I have read in bad way (sorry my Eng is Bat, like someone else say in this forum), the result of the below code, is that it replace completely the found string. What you ask is little more complicate, since you need to replace found instances with something that same instances contain, and changing their content.
So i imagined to accomplish to what you really ask for (i hope i understood this time), with something like this:

Code: Select all

$string = 'blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah>';

$s = preg_match_all('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', $string, $matches, PREG_SET_ORDER); 
$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', '##my007placeolder##', $string, -1 ,$cr);
$pn = 0;
if($cr > 0){
	foreach($matches as $m){
	 $s = preg_replace('/\#\#my007placeolder\#\#/', ' ' . $matches[$pn][1] . '://' . $matches[$pn][2] . ' ', $s, 1); // one x time, in order
	 //print_r($s);echo'<br />'; // a nice demo of "why"
	 $pn++;
  }
}

echo $s;
Then the result will be:

Code: Select all

blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah><br /><br />blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah https://blahblah  https://blahblah  https://blah <blah>
Anybody know how to improve this? Asking myself if there is some other better way ...
Particularly about regexp, it should be there another simpler, somewhere
Ah yes - I probably should have explained it better the 1st time around. I think this change accomplishes what I'm looking for. I'll test and report back in a minute.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

JLA wrote:
Sat Oct 03, 2020 4:42 pm
axe70 wrote:
Sat Oct 03, 2020 4:27 pm
Well, no. I have read in bad way (sorry my Eng is Bat, like someone else say in this forum), the result of the below code, is that it replace completely the found string. What you ask is little more complicate, since you need to replace found instances with something that same instances contain, and changing their content.
So i imagined to accomplish to what you really ask for (i hope i understood this time), with something like this:

Code: Select all

$string = 'blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah>';

$s = preg_match_all('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', $string, $matches, PREG_SET_ORDER); 
$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', '##my007placeolder##', $string, -1 ,$cr);
$pn = 0;
if($cr > 0){
	foreach($matches as $m){
	 $s = preg_replace('/\#\#my007placeolder\#\#/', ' ' . $matches[$pn][1] . '://' . $matches[$pn][2] . ' ', $s, 1); // one x time, in order
	 //print_r($s);echo'<br />'; // a nice demo of "why"
	 $pn++;
  }
}

echo $s;
Then the result will be:

Code: Select all

blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah<https://blahblah><https://blahblah> <https://blah><blah><br /><br />blah blah.     Blahblah<blahblah><blahblah>. blah blah

Blah https://blahblah  https://blahblah  https://blah <blah>
Anybody know how to improve this? Asking myself if there is some other better way ...
Particularly about regexp, it should be there another simpler, somewhere
Ah yes - I probably should have explained it better the 1st time around. I think this change accomplishes what I'm looking for. I'll test and report back in a minute.
Tested in PHP on my side and seems to work perfectly. Thank You!!! FYI - sent you a PM. Thanks Again!
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

Glad it is useful!

A more improved version, could take care of the fact, that could exist, or not, spaces between the captured and the text before or after it, replacing so, if the case with spaces, or not (that's may not important in this case, maybe yes, sure it is in some other scenario), and using foreach this way to make it shorter:

Code: Select all

$s = preg_match_all('/( ?)<(ftp|http|https?):\/\/(.*?)>( ?)/', $string, $matches, PREG_SET_ORDER);
$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', '##my007placeolder##', $string, -1 ,$cr);

if($cr > 0){
 foreach($matches as $m => $mv){
	 $s = preg_replace('/\#\#my007placeolder\#\#/', $matches[$m][1] . $matches[$m][2] . '://' . $matches[$m][3] . $matches[$m][4], $s, 1); // one x time, in order
	 //print_r($s);echo'<br />'; // demo of the "why"
  }
}

echo $s;
and anyway, may the regexp on previous example had to be maybe like this (so switch to):
'/ ?<(ftp|http|https?):\/\/(.*?)> ?/'
and not
'/ ?<(ftp|http|https?):\/\/(.*?)\> ?/'
there is no need of the \ before > (but will work both ways)

Can't test more right now, not sure if it is perfect, surely it is very near
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote:
Sat Oct 03, 2020 6:13 pm
Glad it is useful!

A more improved version, could take care of the fact, that could exist, or not, spaces between the captured and the text before or after it, replacing so, if the case with spaces, or not (that's may not important in this case, maybe yes, sure it is in some other scenario), and using foreach this way to make it shorter:

Code: Select all

$s = preg_match_all('/( ?)<(ftp|http|https?):\/\/(.*?)>( ?)/', $string, $matches, PREG_SET_ORDER);
$s = preg_replace('/ ?<(ftp|http|https?):\/\/(.*?)\> ?/', '##my007placeolder##', $string, -1 ,$cr);

if($cr > 0){
 foreach($matches as $m => $mv){
	 $s = preg_replace('/\#\#my007placeolder\#\#/', $matches[$m][1] . $matches[$m][2] . '://' . $matches[$m][3] . $matches[$m][4], $s, 1); // one x time, in order
	 //print_r($s);echo'<br />'; // demo of the "why"
  }
}

echo $s;
and anyway, may the regexp on previous example had to be maybe like this (so switch to):
'/ ?<(ftp|http|https?):\/\/(.*?)> ?/'
and not
'/ ?<(ftp|http|https?):\/\/(.*?)\> ?/'
there is no need of the \ before > (but will work both ways)

Can't test more right now, not sure if it is perfect, surely it is very near
Sent you another PM - but I will explain more in detail the exact requirements of all scenarios. Also as you did previously - the https can also be http or ftp

1. In the string we must find all instances of

Code: Select all

 <https://blahblahblah>
and replace the < at the start of the instance and the > at the very end of the instance with a whitespace.

2. Examples of how Instances can look:

Code: Select all

blah blah blah<https://blahblahblah>blahblahblah> ( only instance here is <https://blahblahblah> )

Code: Select all

blah blah blah<blah><https://blahblahblah><blahblahblah<https://blahblahmissing<https://anotherblahblah> ( only instances here are <https://blahblahblah> and <https://anotherblahblah )
3. Never inside

Code: Select all

<https://blahblahblah>
will appear another non url appropriate character. These instances will always start with < and end with > if a non-url appropriate character appears inside such as

Code: Select all

 <https://blahblah<>blahblah>blahblahblah 
- then it is ignored In this example you see that a < appears in the url before the 1st > so the URL is invalid and even though more url text follows until the next >the instance is ignored.

Hope this helps.
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

With all our blah blah blah we started from this:
Looking for a simple php regex that will do the following
In a string - will find all occurrences of <https://thiscanbeanyassortmentofanyrandomchars> and remove the < and > at the beginning of each match but not touch any other < or > that might appear in the string
and now we are over to something like this:

Code: Select all

$string='Today at <-thiscanbeanyassortmentofanyrandomchars-><http://thiscanbeanyassortmentofanyrandomchars>it\'s raining but it is ok <b style="color:red">for me because</b> i\'m a <a href="https://www.google.com/search?q=fish&oq=snail">snail</a>
blah blah.     Blahblah<blahblah><blahblah>. blah blah
blah blah blah<https://blahblahblah>blahblahblah>
Blah<https://blahblah> 5555 <https://blahblah<>blahblah>blahblahblah  <https://blahblah>blublublu> <https://blah> <blah>
blah blah blah<blah><https://blahblahblah><blahblahblah <https://blahblahmissing <https://anotherblahblah>';
that if not wrong, should be resolved with something like this:

Code: Select all

$s = preg_match_all('/( ?)<{1}(ftp|http|https?):\/\/(.[^<]*?)>{1}( ?)/', $string, $matches, PREG_SET_ORDER);
$s = preg_replace('/<{1}(ftp|http|https?):\/\/(.[^<]*?)>{1}/', '##my007placeolder##', $string, -1 ,$cr);

if($cr > 0){
 foreach($matches as $m => $mv){
      $s = preg_replace('/\#\#my007placeolder\#\#/', ' ' . $matches[$m][2] . '://' . $matches[$m][3] . ' ', $s, 1); // one x time, in order, adding a space in this case, at right/left
   // $s = preg_replace('/\#\#my007placeolder\#\#/', $matches[$m][1] . $matches[$m][2] . '://' . $matches[$m][3] . $matches[$m][4], $s, 1); //  respect what found on string, re-adding spaces or not based on if there are or not

	 //print_r($s);echo'<br />'; // demo of the "why"
  }
}

echo $s;
not sure if i have understand all needs!
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

Just used to improve a function (the concept, not the regexp), so i like to share with you a little modified version that may you should use instead (this is for your needs):

Code: Select all

$s = preg_match_all('/( ?)<{1}(ftp|https?):\/\/(.[^<]*?)>{1}( ?)/ui', $string, $matches, PREG_SET_ORDER);
$s = preg_replace('/<{1}(ftp|https?):\/\/(.[^<]*?)>{1}/ui', '#W3JB007PH#', $string, -1 ,$cr);
if($cr > 0){
  foreach($matches as $m => $mv){
   // $mv could be used to manipulate each match as more like, doing magic things here
      $s = preg_replace('/\#W3JB007PH\#/u', ' ' . $matches[$m][2] . '://' . $matches[$m][3] . ' ', $s, 1); // one x time, in order, add spaces before and after placeholder
   // $s = preg_replace('/\#W3JB007PH\#/u', $matches[$m][1] . $matches[$m][2] . '://' . $matches[$m][3] . $matches[$m][4], $s, 1); // one x time, re-add spaces only if found on the string
	 //print_r($s);echo'<br />';
  }
}

 echo $s;
The nice of the whole thing to me, is not the regexp, but the way preg_match_all and preg_replace have been used with foreach.

Anybody know a way to do the same, in a shorter, faster and clean way than this?

[EDITED]
Last edited by axe70 on Mon Oct 05, 2020 1:36 pm, edited 1 time in total.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: PHP Regex quick question

Post by JLA »

axe70 wrote:
Mon Oct 05, 2020 8:58 am
Just used to improve a function (the concept, not the regexp), so i like to share with you a little modified version that may you should use instead (this is for your needs):

Code: Select all

$s = preg_match_all('/( ?)<{1}(ftp|http|https?):\/\/(.[^<]*?)>{1}( ?)/ui', $string, $matches, PREG_SET_ORDER);
$s = preg_replace('/<{1}(ftp|http|https?):\/\/(.[^<]*?)>{1}/ui', '#W3JB007PH#', $string, -1 ,$cr);
if($cr > 0){
  foreach($matches as $m => $mv){
   // $mv could be used to manipulate each match as more like, doing magic things here
      $s = preg_replace('/\#W3JB007PH\#/u', ' ' . $matches[$m][2] . '://' . $matches[$m][3] . ' ', $s, 1); // one x time, in order, add spaces before and after placeholder
   // $s = preg_replace('/\#W3JB007PH\#/u', $matches[$m][1] . $matches[$m][2] . '://' . $matches[$m][3] . $matches[$m][4], $s, 1); // one x time, re-add spaces only if found on the string
	 //print_r($s);echo'<br />';
  }
}

 echo $s;
The nice of the whole thing to me, is not the regexp, but the way preg_match_all and preg_replace have been used with foreach.

Anybody know a way to do the same, in a shorter, faster and clean way than this?

[EDITED]

Thanks for this update. Will do some testing with it later today. I had some time off yesterday so didn’t have a chance to look at anything.
User avatar
AbaddonOrmuz
Recognised Extension Developer
Posts: 1046
Joined: Wed Dec 25, 2013 9:06 pm
Location: /dev/null
Name: Alfredo
Contact:

Re: PHP Regex quick question

Post by AbaddonOrmuz »

I might be missing something, because I don't understand why you are over complicating things.

Code: Select all

// Sample data
$text = <<<'EOT'
<https://duckduckgo.com>
text<https://example.org>
sample <text><https://telegram.org/>
<https://invalid<>link>
EOT;

$replaced = preg_replace('#<(https?://[^<>]+)>#', '\1', $text);
Some of my phpBB extensions:
Image Imgur | :chart_with_upwards_trend: SEO Metadata | Image Markdown | :lock: Auto-lock Topics
:trophy: Check out all my validated extensions :trophy:

:penguin: Arch Linux user :penguin:
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: PHP Regex quick question

Post by axe70 »

But seem to not return to me the same result, into a string like this:

Code: Select all

$string = 'Today at <-thiscanbeanyassortmentofanyrandomchars-><http://thiscanbeanyassortmentofanyrandomchars>it\'s raining but it is ok <b style="color:red">for me because</b> i\'m a <a href="https://www.google.com/search?q=fish&oq=snail">snail</a>
blah blah.     Blahblah<blahblah><blahblah>. blah blah
blah blah blah<https://blahblahblah>blahblahblah>
Blah<https://blahblah> 5555 <https://blahblah<>blahblah>blahblahblah  <https://blahblah>blublublu> <https://blah> <blah>
blah blah blah<blah><https://blahblahblah><blahblahblah <https://blahblahmissing <https://anotherblahblah>';
checking ...
Post Reply

Return to “Extension Writers Discussion”