Sphinx Search on phpBB 3.2

Get help with installation and running phpBB 3.2.x here. Please do not post bug reports, feature requests, or extension related questions here.
Post Reply
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5850
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Sphinx Search on phpBB 3.2

Post by thecoalman »

This will be my first use of it and was hoping to get some input before installing it.

Latest version is 2.2.11 from July, anyone using it or any known issues?
Last edited by JimA on Fri Aug 11, 2017 4:09 pm, edited 1 time in total.
Reason: Moved from phpBB Discussion to 3.2 Support Forum
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5850
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Sphinx Search on phpBB 3.2

Post by thecoalman »

Anybody? I got sidetracked with some other things and haven't tried it yet.

Is no one using this? Can anyone from the phpBB team tell me if support for this will continue at least through the 3.2 versions? Is it still fully supported?
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5850
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Sphinx Search on phpBB 3.2

Post by thecoalman »

I went and installed 2.2.11 and I'm getting this error in the indexer.log

Code: Select all

Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinx/sphinx.conf'...
WARNING: key 'sql_query_info' was permanently removed from Sphinx configuration. Refer to documentation for details.
WARNING: key 'stopwords' is not multi-value; value in /etc/sphinx/sphinx.conf line 73 will be ignored.
WARNING: key 'charset_type' was permanently removed from Sphinx configuration. Refer to documentation for details.
ERROR: unknown key name 'compat_sphinxql_magics' in /etc/sphinx/sphinx.conf line 91 col 24.
FATAL: failed to parse config file '/etc/sphinx/sphinx.conf'
Those errors are from the sphinx.conf generated by phpBB. Is phpBB going to continue to support this?


-----edit----

Seems this is issue going back to 2015:

viewtopic.php?f=64&t=2347426&p=14282321 ... #p14282321
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5850
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Sphinx Search on phpBB 3.2

Post by thecoalman »

I've commented out the lines that were causing the errors and it seems to be working, going to have to research what those lines were used for. I was also having some permission errors and used the paths from Sphinx example.conf file for log files and the PID file.

With that said this is extremely fast compared to the regular search.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Sphinx Search on phpBB 3.2

Post by KYPREO »

thecoalman wrote: Sun Aug 13, 2017 10:05 am With that said this is extremely fast compared to the regular search.
Indeed. Bumping this. I had a go at switching to Sphinx on a board for 2 million posts on a localhost test bed running Windows 10 and IIS10.

I have been running phpBB fulltext search. When we upgraded from php 2 to 3.0 in 2007, I recall it took several weeks for the search table to be rebuilt. Our search tables are now 3.1GB.

To my surprise and great pleasure, Sphinx built the search index from scratch in less than 5 minutes and 500MB in size. Not only that I reduced the minimum character limit from 3 to 2, got rid of common word threshold and didn't even bother with a stopword list. I did a sample search for the word "the" and it returned over 1 million results in 3 seconds.

Sphinx is ridiculously fast.

One disadvantage is that the search indexes are not part of the database and instead sit with Sphinx. This means that the search isn't backed up with the forum database. However, the index built itself so fast I doubt it matters and the actual forum database is now half the size compared to native fulltext, meaning backups are now much smaller.

Installation on Windows was actually pretty easy, except the sample cron needs to be changed to run as a scheduled task on Windows. One thing I didn't understand from the instructions is where to actually install sphinx. The ACP instructions say to install outside of the web directory, so I installed in C:\Sphinx\ and this works OK on localhost when I changed all the paths from relative paths to absolute, but query whether it would on my live server. The use of relative paths is confusing and suggests Sphinx has its own subfolder under the forum.

Is Sphinx meant to be installed as a subdirectory of the forum or outside?
phpBB user since 2002
www.AusRotary.com
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Sphinx Search on phpBB 3.2

Post by KYPREO »

In reply to my own query above, in a Windows environment Sphinx is intended to be installed outside of the IIS published website directory. The main difference from the Unix configuration is that in Windows you need to specify the full absolute path, complete with drive letter eg C:\Sphinx\, when first installing Sphinx, and in both the phpBB ACP and the sphinx.conf file. I did this instinctively when I installed on the localhost and it worked perfectly first time.

What was confusing was that I had read 3 different blog posts on configuring Sphinx for phpBB and they had installed Sphinx in /var/www/ or /usr/www/ which I assume is the website root in those Unix setups. If so, this is definitely not what Sphinx recommends. One writeup also said to assign 777 permissions to the Sphinx directory and files, which I also thought was a terrible idea as you don't want to give the search engine and index tables worldwide access. Moroever, the sphinx.conf file in that folder contains an unhashed copy of the database username and password :shock:

I will give this a go on my live board when I next have some downtime and if successful perhaps report back with installation instructions. I'm not sure how many phpBB boards are running in Windows / IIS but it may help someone since there is no other writeup on doing a Windows Sphinx setup for phpBB.
phpBB user since 2002
www.AusRotary.com
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 5850
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.
Contact:

Re: Sphinx Search on phpBB 3.2

Post by thecoalman »

KYPREO wrote: Tue Nov 26, 2019 11:31 am
Is Sphinx meant to be installed as a subdirectory of the forum or outside?
Sphinx needs to be installed outside of the webroot ( a non public directory), the directions in the sphinx documentation assumes root access. How or if you can install on shared hosting I do not know. It's constantly running and consumes RAM, how much depends on size of search index. It's not much for regular operation but is quite a bit when it rebuilds the index. This is typically something that is not going to be supported with shared hosting.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Sphinx Search on phpBB 3.2

Post by KYPREO »

thecoalman wrote: Wed Nov 27, 2019 1:14 amSphinx needs to be installed outside of the webroot ( a non public directory), the directions in the sphinx documentation assumes root access. How or if you can install on shared hosting I do not know. It's constantly running and consumes RAM, how much depends on size of search index. It's not much for regular operation but is quite a bit when it rebuilds the index. This is typically something that is not going to be supported with shared hosting.
Thanks for that confirmation. There might be merit in putting the data directory within the webroot (but without web user acccess) so that the Sphinx configuration and indexes get saved as part of the website backup, but it looks to be working with those files saved in the same directory as the Sphinx executables sitting outside the webroot.

I am running on a self-managed VPS with 4GB RAM, with a bit of capacity to spare.

I agree it would be quite difficult, if not impossible, to properly implement Sphinx on a shared host.

The other limitation worth mentioning is that phpBB's search functionality does not appear able to utilise Sphinx's ability to do exact phrase searching. It looks like mySQL Fulltext is currently the only way to achieve that. I have tested mySQL Fulltext search and while it is able to do exact phrase searching very well, running regular queries through mySQL is very slow on a board as large as mine...the search tables, while not as large as phpBB Native, are still very large in the mult-GB range. Sphinx is so much more compact.

I will submit a ticket/feature request on implementing exact phrase searching in Sphinx, because unlike phpBB native, Sphinx can definitely do it while the addition of simple syntax: http://sphinxsearch.com/docs/current.ht ... ded-syntax
Part of the code in the mySQL search PHP file that deals with phrase searching might be repurposed for Sphinx. I'd voluntary to fix it, but this is way beyond my coding capabilities.
phpBB user since 2002
www.AusRotary.com
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Sphinx Search on phpBB 3.2

Post by KYPREO »

Turns out this was not beyond my coding capabilities!

I have submitted a ticket on area51 and posted a fix which will enable use of the - and | operators, as well as use of Boolean operators (NOT, OR), exact phrases using double quotation marks and wildcard searches using asterisks (in any position in the word). I have also included a fix for indexing hyphenated words and an idea for how to ensure hyphenated search terms behave as the user expects.

See: https://tracker.phpbb.com/browse/PHPBB3-16234

This all behaves just as quickly, albeit with a slightly bigger index size.

I searched the single digit "1" across 2 million posts, it returned the results in 0.169 seconds. I search for the letter "a" and it returned 1 million hits in 0.9 seconds :D
phpBB user since 2002
www.AusRotary.com
KYPREO
Registered User
Posts: 392
Joined: Fri Feb 02, 2018 9:56 am
Contact:

Re: Sphinx Search on phpBB 3.2

Post by KYPREO »

I have created a new section in the Sphinx information page on the phpBB development Wiki to explain how to enable wildcard searching: https://wiki.phpbb.com/Sphinx_Fulltext_ ... _searching

I have also signed up to Github to create a pull request to submit my code changes to fix the other search operators. It's just taking me some time to get my head around how the whole works. :oops: :lol:
phpBB user since 2002
www.AusRotary.com
User avatar
Gwyneth Llewelyn
Registered User
Posts: 42
Joined: Thu Aug 06, 2009 11:34 pm
Location: Neufreistadt, Confederation of the Democratic Simulators, Second Life
Name: Gwyneth Llewelyn
Contact:

Re: Sphinx Search on phpBB 3.2

Post by Gwyneth Llewelyn »

Hi there. Piping in late, I know. I just wanted to add that, although Sphinx is semi-abandoned software (aye, it still gets a release now and then), most of the current community development lives on under Manticore Search. It's a fork of the open-source Sphinx code, adding a lot of bells and whistles, and, most importantly, being kept up-to-date with frequent upgrades/updates. Allegedly, Manticore is 'as fast as' Sphinx, although it might take up even fewer resources, mostly due to a change in the threading library (Manticore uses coroutines) and related code. The point here is that the potential obsolescence of Sphinx is a big unknown, while Manticore continues from where Sphinx left off — and that means things like bug fixes and security issues are timely addressed. Manticore is also trying to position itself as a competitor to ElasticSearch, which also means that it has wider support for other communication protocols, including a RESTful interface using JSON, which ought to be more appealing to those who have worked with different full-text search engines.

Manticore has been designed as a 'drop-in' replacement of Sphinx, and I can report that it truly works 'out of the box' (or almost) using the very detailed instructions provided by @KYPREO on the phpBB Wiki page. You can think of Manticore vs. Sphinx as MariaDB vs. MySQL: both use the same binary communication channels and protocols, so everything should 'just work'. And, indeed, that's the case, taking into account a few configuration options on the Sphinx config file marked as deprecated but which are easily fixed by just looking at the logs.

The main issues are dealing with permissions, but, alas, that's always the case with Unix tools. These may be easier to overcome in some scenarios (e.g. downloading the binaries as opposed to using a package management system; using Manticore exclusively with phpBB, etc.), and they're not absolutely trivial, but... I've seen much worse :D

That said, I'm really amazed at how fast this technology is. It's almost uncanny — indexing a forum with well over a decade of articles, something close to 40k posts, took a few seconds — in fact, I was looking at the logs to see when it would finish, expecting to wait at least a few minutes (if not hours, compared with the native phpBB search and/or the MySQL full-text search). So I was staring at a blank screen with the Manticore indexer logs, expecting that 'something else' happened after the initial few stats. Well, those were all the stats. The indexing took seconds, not even minutes. And aye, it was even indexing things like the word 'a' ;) Doing the actual search is even more impressive, and that really, really made me gasp in astonishment. I was always fond of the superfast phpBB searches, but I can confirm that Manticore is the Usain Bolt of searches :-) And, on top of all that, it's really lightweight. I was a bit scared to let the indexer do its job via cron every five minutes... but, in truth, it's so fast that it can very well run every second that it wouldn't make a difference in terms of server load...

Although the forums that I manage do not have a huge user base — a hundred or so users, not all very active — like every other forum, they're prone to be indexed by half a million legitimate bots, around the clock... many of which, for some stupid reason, are eager to do a few searches now and then. Thus, basically, I'm providing super-fast searching mostly for bots, not humans (who will certainly welcome the extra speed and the new search options, but I guess that most of them won't really notice a big difference...). Alas, this is the strange world we live in: where our masters are the bots, not the humans...

Anyway, I'm always excited to find a few 'old school' programmers around, who focus on better algorithms to deliver insane speed on contemporary hardware, while keeping the memory & CPU footprint amazingly low. That is, by the way, one of the big reasons for sticking with phpBB as opposed to other 'competitors'. They simply cannot beat phpBB's performance — not by far. Sure, the overall interface is a bit dated (it's hard to keep up with so many changing frameworks every year), and it's much harder to completely change the template without breaking extensions or other functionality, but it's a small price to pay in exchange for the incredible performance. So I'm glad that the Sphinx/Manticore crowd also use a similar stance towards programming :D
I'm just a virtual girl in a virtual world...
Post Reply

Return to “[3.2.x] Support Forum”