Allowing Underscores in Searches

Get help with installation and running phpBB 3.0.x here. Please do not post bug reports, feature requests, or MOD-related questions here.
Anti-Spam Guide
Forum rules
END OF SUPPORT: 1 January 2017 (announcement)
Locked
RaymondT
Registered User
Posts: 3
Joined: Sun Mar 18, 2012 3:56 pm

Allowing Underscores in Searches

Post by RaymondT » Sun Mar 18, 2012 4:13 pm

So. I help administer a website for a game and one portion of this is a forum. Our administration and support team tracks player status in a special area of the forum; we log when a player has been warned for bad behavior, when it was lifted, etc. A necessary part of this is logging the player's in-game name (this is not our game, so we cannot control player names whatsoever).

All status changes are stored in a single thread with the player's name in the topic, as well as in the body. This way we can keep a chronological history of a player's behavior in one thread for easy tracking. However we are severely limited when we have players whose names include underscores. We have found no way, at all, to search for posts with these players in there, we have to go through one by one through hundreds, sometimes thousands of threads. It is really becoming a deal breaker.

Yes, I am aware of the '*' wildcard character, but it is completely useless if the player's name starts with an underscore, or if the name is "the_destroyer" since a wildcard search for "the*" brings back a ridiculous number of results.

My question is: Are we missing something somewhere that would allow these characters in search or is this ever going to be addressed (as I have seen threads with "deal with it" as a response)? If it is not going to be dealt with is there a search modification I can install or make to change this behavior? The underscore is the only special character allowed by the game, so it is the only one we ever need to search for.

Any help on this would go beyond greatly appreciated as migrating to another forum system at this point would be an extremely unpleasant task and we all enjoy the set-up with phpBB anyway. Thank you in advance.

Edit: We can use the search function built into our browser to search one page at a time, but this is still rather time consuming as we can only search one page in each forum at a time, does not work for us to properly track.

RaymondT
Registered User
Posts: 3
Joined: Sun Mar 18, 2012 3:56 pm

Re: Allowing Underscores in Searches

Post by RaymondT » Tue Mar 20, 2012 1:02 pm

Well this is disheartening.

Is there another forum system anyone knows of that does allow the inclusion of underscores in search terms?

User avatar
Erik Frèrejean
Former Team Member
Posts: 9899
Joined: Tue Oct 09, 2007 9:09 am
Location: The Netherlands, 3.0.x Support Forum
Name: Erik Frèrejean
Contact:

Re: Allowing Underscores in Searches

Post by Erik Frèrejean » Tue Mar 20, 2012 1:23 pm

phpBB handles underscores just fine.

Please fill out the Support Request Template Generator and post it back here to enable us to assist you better.
Support Toolkit | Support Request Template | Knowledge Base | phpBB 3.0.x documentation
I don't give support via PM or IM! (all unsolicited pms will be trashed!)

Oleg
Former Team Member
Posts: 1221
Joined: Sat Jan 30, 2010 4:42 pm
Location: NYC
Contact:

Re: Allowing Underscores in Searches

Post by Oleg » Tue Mar 20, 2012 11:35 pm

area51 seems to return an unusually large number of results for my test search.
Participate in phpBB development: Get involved | Issue tracker | Report a bug | Development board | [url=irc://chat.freenode.net/phpbb-dev]Development IRC chat[/url]
My stuff: mindlinkgame.com

User avatar
Oyabun1
Former Team Member
Posts: 23162
Joined: Sun May 17, 2009 1:05 pm
Location: Australia
Name: Bill

Re: Allowing Underscores in Searches

Post by Oyabun1 » Wed Mar 21, 2012 4:06 am

Erik Frèrejean wrote:phpBB handles underscores just fine.
Not sure that what phpbb.com returns for a search is actually relevant in this case since phpbb.com doesn't use the native search engine, it uses Sphinx (and maybe Area51 does as well).

If you want to search for "the_destroyer" try searching for +the*destroyer, it will likely return a lot of irrelevant results but hopefully also what you are looking for in the first few. That is, use the wildcard for as few characters as possible.

Or if you want to give the Sphinx search engine a try with the board there is a MOD in development, [Beta] Sphinx search for phpBB 1.0.beta2.
                      Support Request Template
3.0.x: Knowledge Base Styles Support MOD Requests
3.1.x: Knowledge BaseStyles SupportExtension Requests

RaymondT
Registered User
Posts: 3
Joined: Sun Mar 18, 2012 3:56 pm

Re: Allowing Underscores in Searches

Post by RaymondT » Wed Mar 28, 2012 9:50 pm

I tried a fresh vanilla installation of phpbb 3.0.10 on a separate server for testing, created ten sample threads, and searching for underscores does not work. So I am not sure what Erik is referring to, unless I am missing a setting somewhere. I also hopped into the support channel in IRC and the people in there said searching for underscores is not possible with the standard search function.

As for Sphinx - that is exactly what I am looking for, and I can keep phpBB, Thank you very much!

EDIT: Ah, I see. phpbb.com uses the sphinx plugin, which is why this website can utilize the underscore in searches. It is not built into phpBB itself. Unfortunate, but better than nothing.

nl2dav
Registered User
Posts: 102
Joined: Tue Jun 25, 2002 10:39 pm
Location: NOP, The Netherlands
Contact:

Re: Allowing Underscores in Searches

Post by nl2dav » Wed Sep 28, 2016 8:05 pm

Sphinx is not a solution but a workaround.

The problem is that phpBB in \includes\search\fulltext_mysql.php priorities PCRE preg_match_all instead of a PHP preg_match_all and in the PCRE code there is only p{L}\p{N} defined which matches a letter or number not an underscore. The PHP code a bit more down in the code does include the underscore though because it's making use of \w

Because the PCRE code also makes an error by creating two underscores while splitting keywords this bit needs to be disabled also. So the best thing to do, if you want to keep the original code is do something like I did;

Code: Select all

		// Filter out as above
		$split_keywords = preg_replace("#[\n\r\t]+#", ' ', trim(htmlspecialchars_decode($keywords)));

		// Split words
		/* if ($this->pcre_properties)
		{
			$split_keywords = preg_replace('#([^\p{L}\p{N}\'*"()])#u', '$1$1', str_replace('\'\'', '\' \'', trim($split_keywords)));
		}
		if ($this->mbstring_regex)
		{
			$split_keywords = mb_ereg_replace('([^\w\'*"()])', '\\1\\1', str_replace('\'\'', '\' \'', trim($split_keywords)));
		}
		else
		{ */
			$split_keywords = preg_replace('#([^\w\'*"()])#u', '$1$1', str_replace('\'\'', '\' \'', trim($split_keywords)));
		//}

		/*
		if ($this->pcre_properties)
		{
			$matches = array();
			preg_match_all('#(?:[^\p{L}\p{N}*"()]|^)([+\-|]?(?:[\p{L}\p{N}*"()]+\'?)*[\p{L}\p{N}*"()])(?:[^\p{L}\p{N}*"()]|$)#u', $split_keywords, $matches);
			$this->split_words = $matches[1];
		}
		if ($this->mbstring_regex)
		{
			mb_ereg_search_init($split_keywords, '(?:[^\w*"()]|^)([+\-|]?(?:[\w*"()]+\'?)*[\w*"()])(?:[^\w*"()]|$)');

			while (($word = mb_ereg_search_regs()))
			{
				$this->split_words[] = $word[1];
			}
		}
		else
		{ */
			$matches = array();
			preg_match_all('#(?:[^\w*"()]|^)([+\-|]?(?:[\w*"()]+\'?)*[\w*"()])(?:[^\w*"()]|$)#u', $split_keywords, $matches);
			$this->split_words = $matches[1];
		//}

		// We limit the number of allowed keywords to minimize load on the database
To get a working search with underscores. No idea why PCRE and mbstring_regex gets prioritization, maybe because of speed issues. But if this leads to different functionality then this is of course not good.

I am aware that I kick a four year old thread and that support ends on 1-1-2017 but maybe I still can make someone happy with this message? 8-) .. The underscore is used in Youtube video urls.

User avatar
AmigoJack
Registered User
Posts: 5588
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Allowing Underscores in Searches

Post by AmigoJack » Thu Sep 29, 2016 7:35 am

Never stumbled upon this one. The solution however is not good, as \w is locale dependant and thus can fail to match diacritics, not to mention non-latin letters and non-arabic numbers. It would be far better to just add _ into the character classes and not kill Unicode support entirely.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

nl2dav
Registered User
Posts: 102
Joined: Tue Jun 25, 2002 10:39 pm
Location: NOP, The Netherlands
Contact:

Re: Allowing Underscores in Searches

Post by nl2dav » Sun Oct 02, 2016 2:04 am

Right hhmmm.. I see why PCRE gets prioritized now...

So the correct solution would be to replace;

Code: Select all

        if ($this->pcre_properties)
        {
            $split_keywords = preg_replace('#([^\p{L}\p{N}\'*"()])#u', '$1$1', str_replace('\'\'', '\' \'', trim($split_keywords)));
        }
         
with

Code: Select all

        if ($this->pcre_properties)
        {
            $split_keywords = preg_replace('#([^\p{L}\p{N}\p{Xwd}\'*"()])#u', '$1$1', str_replace('\'\'', '\' \'', trim($split_keywords)));
        }
 
and

Code: Select all

        if ($this->pcre_properties)
        {
            $matches = array();
            preg_match_all('#(?:[^\p{L}\p{N}*"()]|^)([+\-|]?(?:[\p{L}\p{N}*"()]+\'?)*[\p{L}\p{N}*"()])(?:[^\p{L}\p{N}*"()]|$)#u', $split_keywords, $matches);
            $this->split_words = $matches[1];
        }
         
with

Code: Select all

        if ($this->pcre_properties)
        {
            $matches = array();
            preg_match_all('#(?:[^\p{L}\p{N}\p{Xwd}*"()]|^)([+\-|]?(?:[\p{L}\p{N}\p{Xwd}*"()]+\'?)*[\p{L}\p{N}\p{Xwd}*"()])(?:[^\p{L}\p{N}\p{Xwd}*"()]|$)#u', $split_keywords, $matches);
            $this->split_words = $matches[1];
        } 
Tests seem to succeed so far. Thanks for the reply.

User avatar
AmigoJack
Registered User
Posts: 5588
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Allowing Underscores in Searches

Post by AmigoJack » Tue Oct 04, 2016 8:39 am

That is redundant: \p{L}\p{N} is already within \p{Xwd}.
http://www.pcre.org/original/doc/html/pcrepattern.html wrote:Xan matches characters that have either the L (letter) or the N (number) property. Xps matches the characters tab, linefeed, vertical tab, form feed, or carriage return, and any other character that has the Z (separator) property. Xsp is the same as Xps; it used to exclude vertical tab, for Perl compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd matches the same characters as Xan, plus underscore.
So either replace \p{L}\p{N} by \p{Xwd}, or just add _.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

Locked

Return to “[3.0.x] Support Forum”