[Beta] Sphinx search for phpBB 1.0.beta2

A place for MOD Authors to post and receive feedback on MODs still in development. No MODs within this forum should be used within a live environment!
Scam Warning
Locked
aig
Registered User
Posts: 3
Joined: Mon Jun 14, 2010 11:10 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by aig » Tue Jun 15, 2010 9:54 am

naderman wrote:That doesn't appear to have anything to do with sphinx. Looks like a performance issue with phpBB which you would best report to http://tracker.phpbb.com - thanks.
Just for the record, you're right. I posted a quick fix for this at http://tracker.phpbb.com/browse/PHPBB3-9658

Cheers,
aig

figvam
Registered User
Posts: 2
Joined: Thu Dec 07, 2006 7:56 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by figvam » Mon Jun 28, 2010 8:27 pm

I took advantage of github, forked the sphinx-for-phpbb tree and added a couple of fixes and small enhancements to the fork. The forked project could be found here:
http://github.com/figvam/sphinx-for-phpbb

This commit fixes the outstanding issues (still not fixed an year later): stray var_dump, missing sql_query_post_index in the delta source, typo in the charset table.

This commit fixes a single document overlap between the indexes.

This commit makes index names much more shorter, so they don't take so much visual space in the query log anymore. Seeing those incomprehensible 40-char long index names in each line (repeated twice!) made my eyes hurt.

This commit adds two settings to the admin CP: Sphinx server IP for those running Sphinx instance on a separate server, and a setting for the minimum indexable word length - it was hardcoded at 2 chars before, but Sphinx is perfectly capable of indexing 1-char words.

Finally, this commit adds a special "search server is unavailable" error when phpBB can't connect to the Sphinx daemon. Previously the search page showed a misleading "search produced no results" error in such cases.

Feel free to pull any of these changes in the main tree.

figvam
Registered User
Posts: 2
Joined: Thu Dec 07, 2006 7:56 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by figvam » Tue Jun 29, 2010 10:14 am

One thing to note about sorting by subject - it uses Sphinx str2ordinal feature. Sphinx documentation says this about the feature:
Note that the ordinals are by construction local to each index, and it's therefore impossible to merge ordinals while retaining the proper order. The processed strings are replaced by their sequential number in the index they occurred in, but different indexes have different sets of strings. For instance, if 'main' index contains strings "aaa", "bbb", "ccc", and so on up to "zzz", they'll be assigned numbers 1, 2, 3, and so on up to 26, respectively. But then if 'delta' only contains "zzz" the assigned number will be 1. And after the merge, the order will be broken. Unfortunately, this is impossible to workaround without storing the original strings (and once Sphinx supports storing the original strings, ordinals will not be necessary any more).
And indeed, my testing shows that sorting by subject is incorrect when it spans both indexes (main and delta), exactly as described in the quote. More sophisticated approach is needed here.

thenickdude
Registered User
Posts: 16
Joined: Mon Nov 17, 2008 6:28 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by thenickdude » Mon Jul 26, 2010 12:11 pm

Thanks, this search engine performs truly excellently on our forum. The native search engine was taking ten minutes for some queries, totally stalling our database.

The Sphinx index built in a few minutes. Even crazy searches like searching for the word "I" are much faster than the fastest native search ever was. We have 4,200,000 posts by 120,000 users. The index is about 1.5GB.

I'm running Sphinx 0.9.9 and figvam's branch. I replaced the sphinxapi-0.9.8.php file with the new one for 0.9.9 with no ill-effects.
Last edited by thenickdude on Tue Jul 27, 2010 12:59 am, edited 1 time in total.

thenickdude
Registered User
Posts: 16
Joined: Mon Nov 17, 2008 6:28 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by thenickdude » Tue Jul 27, 2010 12:00 am

Okay, the plugin is getting remarkably kill-happy:

Code: Select all

[Mon Jul 26 23:31:45.873 2010] [ 4142] accepting connections
[Mon Jul 26 23:31:56.241 2010] [ 4342] accepting connections
[Mon Jul 26 23:31:59.249 2010] [ 4377] accepting connections
[Mon Jul 26 23:32:03.703 2010] [ 4424] accepting connections
[Mon Jul 26 23:32:08.834 2010] [ 4462] accepting connections
[Mon Jul 26 23:32:13.542 2010] [ 4484] accepting connections
[Mon Jul 26 23:32:19.693 2010] [ 4552] accepting connections
[Mon Jul 26 23:32:28.794 2010] [ 4602] accepting connections
[Mon Jul 26 23:32:34.138 2010] [ 4639] accepting connections
[Mon Jul 26 23:32:37.370 2010] [ 4669] accepting connections
[Mon Jul 26 23:32:46.868 2010] [ 4731] accepting connections
[Mon Jul 26 23:33:18.869 2010] [ 4829] accepting connections
It seems to rely on the first or second PID returned from pidof being the searchd listed in the .pid file. The problem is that if many searches all start at once, many searchd instances get created. If the active one isn't one of the first returned they all get killed. Then created. Then killed. Then created. Solution:
// make sure it's really not running
//$this->shutdown_searchd();
Actually it's probably worthwhile to replace the whole of searchd_running() {} with return true; (I've never seen searchd die when the plugin didn't kill it).

The other change I've made is to comment out the whole block in tidy() which rewrites the main index - I'll do that in a cron job in the evenings. With the default GC interval of 30 minutes and a 1.5GB index, running tidy is suicide.

EDIT: I think my problems with multiple instances of searchd were caused by running searchd as root. It meant that the searchd process couldn't be contacted by www-data and so the plugin assumed there were no instances running and launched many. I fixed it by running it as www-data:

Code: Select all

setuid www-data searchd --config /sphinx/conf/sphinx.conf >> /sphinx/data/log/searchd.log 2>&1

altafali
Registered User
Posts: 28
Joined: Thu May 13, 2010 8:30 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by altafali » Mon Aug 23, 2010 2:33 pm

hi to all

i am new here and want to install sphinx search for my board

need some instruction to run it. want to know how to setup this in acp and some additional option to be set in acp or whatsoever


thanks to all

User avatar
DoYouSpeakWak
Registered User
Posts: 2307
Joined: Fri Jul 25, 2008 1:32 pm
Location: Island of Wak-Wak
Name: Hans Lassen
Contact:

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by DoYouSpeakWak » Mon Aug 23, 2010 2:50 pm

altafali wrote:hi to all

i am new here and want to install sphinx search for my board

need some instruction to run it. want to know how to setup this in acp and some additional option to be set in acp or whatsoever


thanks to all
unless your really good with php and linux i would recommend you put that off a bit. At the moment there is as you can read quite a few DIY related to this mod.

Hopefully someone will make a update that us normal users can install without spending a whole weekend on it.
Whatever you share comes back. Support the phpBB Communities
My Validated and Released Modifications
Offering paid phpBB help and System administrator services.

User avatar
PandoraBox_2007
Registered User
Posts: 2
Joined: Mon Apr 28, 2008 7:32 am
Location: Ukraine
Name: Denis [skype:Robert.Sperring1]
Contact:

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by PandoraBox_2007 » Tue Aug 31, 2010 4:39 am

Ciao121 wrote:Can somebosy please help me?
I alway get "no results".

Indexer created files in the data folder; also in the acp I can see:
Number of posts in main index:3835797
Total number of indexed messages (this has been translated - could be different):3835797
but also:
Number of posts in frequently updated delta index:0
Recent search queries: <--- empty (nothing here)

Also sphinx-query.log is empty...

Please help :oops:

Edit:
If I execute from shell:

Code: Select all

search --config /path/to/my/sphinx.conf -p test
results are returned.
Fix this bug

Code: Select all

--- includes/search/fulltext_sphinx.php    Вт авг 31 07:21:48 2010
+++ includes/search/fulltext_sphinx.php    Вт авг 31 07:21:36 2010
@@ -40,11 +40,16 @@
 }
 
 /**
+* @ignore
+*/
+include_once($phpbb_root_path . 'includes/search/search.' . $phpEx);
+
+/**
 * fulltext_sphinx
 * Fulltext search based on the sphinx search deamon
 * @package search
 */
-class fulltext_sphinx
+class fulltext_sphinx extends search_backend
 {
     var $stats = array();
     var $word_length = array();
@@ -447,6 +452,29 @@
             return false;
         }
 
+// (c) Pandora
+         // generate a search_key from all the options to identify the results
+         $search_key = md5(implode('#', array(
+             $this->search_query,
+             $type,
+             $fields,
+             $terms,
+             $sort_days,
+             $sort_key,
+             $topic_id,
+             implode(',', $ex_fid_ary),
+             implode(',', $m_approve_fid_ary),
+             implode(',', $author_ary)
+         )));
+ 
+         // try reading the results from cache
+         $result_count = 0;
+         if (false && $this->obtain_ids($search_key, $result_count, $id_ary, $start, $per_page, $sort_dir) == SEARCH_RESULT_IN_CACHE)
+         {
+             return $result_count;
+         }
+// (c) Pandora
+
         $id_ary = array();
 
         $join_topic = ($type == 'posts') ? false : true;
@@ -586,6 +614,8 @@
 
         $result_count = $result['total_found'];
 
+         // store the ids, from start on then delete anything that isn't on the current page because we only need ids for one page
+         $this->save_ids($search_key, $this->search_query, $author_ary, $result_count, $id_ary, $start, $sort_dir);
         $id_ary = array_slice($id_ary, 0, (int) $per_page);
 
         return $result_count;

function author_search => $this->save_ids, also should be fixed not working the log of the last search by author :mrgreen:

thenickdude
Registered User
Posts: 16
Joined: Mon Nov 17, 2008 6:28 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by thenickdude » Sun Sep 19, 2010 8:32 am

Keywords from the query aren't properly escaped before being sent to Sphinx. This causes searches including the character "/" to fail with "There has been an error accessing the search server. Please try again in a few minutes." The fix is easy. Replace this code in fulltext_sphinx.php:

Code: Select all

		$result = $this->sphinx->Query($search_query_prefix . str_replace('"', '"', $this->search_query), $this->indexes);

		// could be connection to localhost:3312 failed (errno=111, msg=Connection refused) during rotate, retry if so
		$retries = CONNECT_RETRIES;
		while (!$result && (strpos($this->sphinx->_error, "errno=111,") !== false) && $retries--)
		{
			usleep(CONNECT_WAIT_TIME);
			$result = $this->sphinx->Query($search_query_prefix . str_replace('"', '"', $this->search_query), $this->indexes);
		}
With:

Code: Select all

		$result = $this->sphinx->Query($search_query_prefix . $this->sphinx->EscapeString(str_replace('"', '"', $this->search_query)), $this->indexes);

		// could be connection to localhost:3312 failed (errno=111, msg=Connection refused) during rotate, retry if so
		$retries = CONNECT_RETRIES;
		while (!$result && (strpos($this->sphinx->_error, "errno=111,") !== false) && $retries--)
		{
			usleep(CONNECT_WAIT_TIME);
			$result = $this->sphinx->Query($search_query_prefix . $this->sphinx->EscapeString(str_replace('"', '"', $this->search_query)), $this->indexes);
		}
That adds calls to EscapeString to properly escape the keywords being searched for.

Ciao121
Registered User
Posts: 239
Joined: Wed Jan 28, 2004 1:08 pm

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by Ciao121 » Tue Oct 05, 2010 9:44 pm

Help please...
It seems to me that searches from inside phpbb only searches the delta index.
Searching from the command line, with the same conf file, return results from bot main and delta indexes.

Any idea??? :cry:
Apri il tuo forum gratuito in 1 minuto.

lang song
Registered User
Posts: 2
Joined: Mon Nov 15, 2010 8:12 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by lang song » Mon Nov 15, 2010 8:36 am

bump

laric
Registered User
Posts: 5
Joined: Thu Sep 04, 2008 12:03 pm

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by laric » Thu Nov 25, 2010 3:11 pm

Hello,

Just one word to let you know I have updated our quite large forum to use Sphinx 1.10b remotely.

So far we were using 0.9.8 local to our web box, now I'm having sphinx (Indexer and Search) running on the DB box (same local net)...
I have updated the sphinxapi file to the latest (1.10) with no ill effect (but without using any possible new features). I'm using figvam branch.

On the distant box, I have a couple of jobs doing the indexing, on every night indexing the whole content of board; one every 10minutes doing the daily delta index.

Forum have about 8M messages, 150 000 users.

BTW, I'm wondering why this MOD isn't that improved (although it works pretty well :D) and hope it'll replace existing "native" version in future releases.

--Patrice

altafali
Registered User
Posts: 28
Joined: Thu May 13, 2010 8:30 am

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by altafali » Thu Feb 03, 2011 10:57 am

a quick question

does this work with
phpBB 3.0.7-PL1

thanks

laric
Registered User
Posts: 5
Joined: Thu Sep 04, 2008 12:03 pm

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by laric » Thu Feb 03, 2011 1:26 pm

Yes it does, it also works on 3.0.8 ;)

--Patrice

SuperFedya
Registered User
Posts: 248
Joined: Sun Jul 14, 2002 9:14 pm
Contact:

Re: [Beta] Sphinx search for phpBB 1.0.beta2

Post by SuperFedya » Thu Feb 10, 2011 3:26 am

Version sphinx4phpbb-1.0.beta2 works fine or I need to apply some fixes for 3.0.8?

Thanks

Locked

Return to “[3.0.x] MODs in Development”