Bad table building killing forum-- please fix flaw in phpbb!

This is an archive of the phpBB 2.0.x support forum. Support for phpBB2 has now ended.
Forum rules
Following phpBB2's EoL, this forum is now archived for reference purposes only.
Please see the following announcement for more information: viewtopic.php?f=14&t=1385785
Locked
ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Bad table building killing forum-- please fix flaw in phpbb!

Post by ezlynx »

I mentioned this maybe a year ago. Nothing was ever done. It's still happening and as my board gets bigger and bigger it is affecting how it works even more.

The construction of the search tables is faulty.

It is killing my board! I see other people complain about slowness and I see this happening over and over on my own board.

The problem occurs when you create a post that only uses common words or uses words that are too long for the search word tables. It hangs and hangs and hangs while I wait for it the words to post. I can see the select word_id time getting longer and longer.

Will anyone who knows the software intimately please take a look at it? Since I can recreate the exact same problem here by posting using only common words, I know this is happening everywhere.

HELP! PLEASE!

Don't know if it is looping continuing to look for nonexistent word id numbers or what, but it isn't working right!

I don't think most people realize this is affecting their boards but it is. And for some large boards, it is crippling.

tcr2005
Registered User
Posts: 657
Joined: Tue Jul 25, 2006 3:40 am
Contact:

Post by tcr2005 »

post your link and any errors you may be recieving

albosky
Registered User
Posts: 463
Joined: Sun Apr 09, 2006 7:13 pm

Post by albosky »

I am going to guess that you are talking about the function remove_common in functions_search.php

Code: Select all

						$sql = "SELECT m.word_id 
				FROM " . SEARCH_MATCH_TABLE . " m, " . SEARCH_WORD_TABLE . " w 
				WHERE w.word_text IN ($word_id_sql)  
					AND m.word_id = w.word_id 
				GROUP BY m.word_id 
				HAVING COUNT(m.word_id) > $common_threshold";
		}
		else 
		{
			$sql = "SELECT word_id 
				FROM " . SEARCH_MATCH_TABLE . " 
				GROUP BY word_id 
				HAVING COUNT(word_id) > $common_threshold";
		}
When the word match table gets extremely large like on the forum I run (~42 million rows in that table) , those queries are going to run take forever to run . Have a read through phpBB tweaks for large forums. The post is quite old but alot of the tweaks listed there for eliminating useless words from being added to the search tables still apply. As well as the KB article Decreasing Searchtables size

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Post by ezlynx »

There are no error messages. What I see is the cpu shooting up and the query taking longer and longer (mytop). I have a monitoring script from someone using the same host.

What I recommend is you do a test post on this board (or your own) that uses one or two common words like this, the, there, and, and/or you. Compare posting time to any other post you might make on this or another board with a more complex and wordy posting.

What I have is that process slowing everything down the line. If it did not hang up, it might help get my board out of the red.

I have a common words table of 288 words specifically selected for my forum (did an analysis of all the search words and how frequently they were recorded). It's dropped the size of my search tables. These carefully selected words also mean there's a higher probability someone will create a conversational post using only those words.

The query I see is:
Query SELECT word_id FR

I don't know if what you posted is related to the problem (my mind draws a blank on how the php is functioning). Maybe it is looking for the word first and should instead not look for the word if it is a common word and then not search for the word_id?

I've been following Lanser's post (mostly his ideas). My board isn't as big. And my understanding of how to effect the changes pretty shakey.

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Post by ezlynx »

Thanks for the link to the decreasing tables page. I think I found it and tried it some time ago (to run the script) but was not successful. I think the first of the changes he suggests may have been taken care of by now (what you have right now is search_stopword everywhere except where it references the txt file which is called search stopwords.txt).

espicom
Registered User
Posts: 17905
Joined: Wed Dec 22, 2004 1:14 am
Location: Woodstock, IL

Post by espicom »

The file name is search_stopwords.txt (the underscore is significant), and it is language-specific.

I have seen several projects where people were going to use the database-specific "full text" indexing available for some databases, which eliminates the use of the search tables. I believe this is an option with Olympus, if your database supports it. I remember also seeing talk of a MOD that uses GOOGLE to do the searching, assuming that you get Google to come by often enough to be useful.
Jeff
Fixing 1016/1030/1034 Errors | (obsolete link) | MySQL 4.1/5.x Client Error | phpBBv2 Logo in ACP
Support requests via PM are ignored!
"To be fully alive is to feel that everything is possible." - Eric Hoffer

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Post by ezlynx »

Yes, I have a file I modified, search_stopwords.txt
It contains 288 common words, some are unique to my board because of content. They are indeed the most common from my analysis of the whole database using a script to sort words by frequency.

I used to have a wbb board that searched the whole database (vs. constructing and using a search table). It dragged terribly and I was forced by my host to switch to phpBB (couldn't be happier I made the switch) 4 years ago. You can imagine my board is considerably bigger now (maybe 525mb). Would be interesting to have Google index it but I don't know if it's practical or a real option.

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Post by ezlynx »

Here's the problem area:

Code: Select all

function split_words($entry, $mode = 'post')
{
	// If you experience problems with the new method, uncomment this block.
/*
	$rex = ( $mode == 'post' ) ? "/\b([\w±µ-ÿ][\w±µ-ÿ']*[\w±µ-ÿ]+|[\w±µ-ÿ]+?)\b/" : '/(\*?[a-z0-9±µ-ÿ]+\*?)|\b([a-z0-9±µ-ÿ]+)\b/';
	preg_match_all($rex, $entry, $split_entries);

	return $split_entries[1];
*/
	// Trim 1+ spaces to one space and split this trimmed string into words.
	return explode(' ', trim(preg_replace('#\s+#', ' ', $entry)));
}

function add_search_words($mode, $post_id, $post_text, $post_title = '')
{
	global $db, $phpbb_root_path, $board_config, $lang;

	$stopword_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_stopwords.txt");
	$synonym_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_synonyms.txt");

	$search_raw_words = array();
	$search_raw_words['text'] = split_words(clean_words('post', $post_text, $stopword_array, $synonym_array));
	$search_raw_words['title'] = split_words(clean_words('post', $post_title, $stopword_array, $synonym_array));

	@set_time_limit(0);

	$word = array();
	$word_insert_sql = array();
	while ( list($word_in, $search_matches) = @each($search_raw_words) )
	{
		$word_insert_sql[$word_in] = '';
		if ( !empty($search_matches) )
		{
			for ($i = 0; $i < count($search_matches); $i++)
			{
				$search_matches[$i] = trim($search_matches[$i]);

				if( $search_matches[$i] != '' )
				{
					$word[] = $search_matches[$i];
					if ( !strstr($word_insert_sql[$word_in], "'" . $search_matches[$i] . "'") )
					{
						$word_insert_sql[$word_in] .= ( $word_insert_sql[$word_in] != "" ) ? ", '" . $search_matches[$i] . "'" : "'" . $search_matches[$i] . "'";
					}
				}
			}
		}
	}

	if ( count($word) )
	{
		sort($word);

		$prev_word = '';
		$word_text_sql = '';
		$temp_word = array();
		for($i = 0; $i < count($word); $i++)
		{
			if ( $word[$i] != $prev_word )
			{
				$temp_word[] = $word[$i];
				$word_text_sql .= ( ( $word_text_sql != '' ) ? ', ' : '' ) . "'" . $word[$i] . "'";
			}
			$prev_word = $word[$i];
		}
		$word = $temp_word;

		$check_words = array();
		switch( SQL_LAYER )
		{
			case 'postgresql':
			case 'msaccess':
			case 'mssql-odbc':
			case 'oracle':
			case 'db2':
				$sql = "SELECT word_id, word_text
					FROM " . SEARCH_WORD_TABLE . "
					WHERE word_text IN ($word_text_sql)";
				if ( !($result = $db->sql_query($sql)) )
				{
					message_die(GENERAL_ERROR, 'Could not select words', '', __LINE__, __FILE__, $sql);
				}

				while ( $row = $db->sql_fetchrow($result) )
				{
					$check_words[$row['word_text']] = $row['word_id'];
				}
				break;
		}

		$value_sql = '';
		$match_word = array();
		for ($i = 0; $i < count($word); $i++)
		{
			$new_match = true;
			if ( isset($check_words[$word[$i]]) )
			{
				$new_match = false;
			}

			if ( $new_match )
			{
				switch( SQL_LAYER )
				{
					case 'mysql':
					case 'mysql4':
						$value_sql .= ( ( $value_sql != '' ) ? ', ' : '' ) . '(\'' . $word[$i] . '\', 0)';
						break;
					case 'mssql':
					case 'mssql-odbc':
						$value_sql .= ( ( $value_sql != '' ) ? ' UNION ALL ' : '' ) . "SELECT '" . $word[$i] . "', 0";
						break;
					default:
						$sql = "INSERT INTO " . SEARCH_WORD_TABLE . " (word_text, word_common)
							VALUES ('" . $word[$i] . "', 0)";
						if( !$db->sql_query($sql) )
						{
							message_die(GENERAL_ERROR, 'Could not insert new word', '', __LINE__, __FILE__, $sql);
						}
						break;
				}
			}
		}

		if ( $value_sql != '' )
		{
			switch ( SQL_LAYER )
			{
				case 'mysql':
				case 'mysql4':
					$sql = "INSERT IGNORE INTO " . SEARCH_WORD_TABLE . " (word_text, word_common)
						VALUES $value_sql";
					break;
				case 'mssql':
				case 'mssql-odbc':
					$sql = "INSERT INTO " . SEARCH_WORD_TABLE . " (word_text, word_common)
						$value_sql";
					break;
			}

			if ( !$db->sql_query($sql) )
			{
				message_die(GENERAL_ERROR, 'Could not insert new word', '', __LINE__, __FILE__, $sql);
			}
		}
	}

	while( list($word_in, $match_sql) = @each($word_insert_sql) )
	{
		$title_match = ( $word_in == 'title' ) ? 1 : 0;

		if ( $match_sql != '' )
		{
			$sql = "INSERT INTO " . SEARCH_MATCH_TABLE . " (post_id, word_id, title_match)
				SELECT $post_id, word_id, $title_match
					FROM " . SEARCH_WORD_TABLE . "
					WHERE word_text IN ($match_sql)";
			if ( !$db->sql_query($sql) )
			{
				message_die(GENERAL_ERROR, 'Could not insert new word matches', '', __LINE__, __FILE__, $sql);
			}
		}
	}

	if ($mode == 'single')
	{
		remove_common('single', 4/10, $word);
	}

	return;
}
I think this code cannot deal with the title or topic words being only common words and creates an array that is empty but isn't checked for being empty.

These arrays are created and used:

$stopword_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_stopwords.txt");
$synonym_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_synonyms.txt");

$search_raw_words = array();
$search_raw_words['text'] = split_words(clean_words('post', $post_text, $stopword_array, $synonym_array));
$search_raw_words['title'] = split_words(clean_words('post', $post_title, $stopword_array, $synonym_array));

$word = array();
$word_insert_sql = array();
$temp_word = array();
$word = $temp_word;

$check_words = array();
$match_word = array();

Can we insert some code to check if the word array that seems to be built based on the stopword list and synonym list is empty? If what you have left has no words to get the id for, you can't process it and it should be bypassed. How can we do this?

I badly need someone who understand the search code and how it works.

By the way, does anyone know what this function does?
@set_time_limit(0);

abusedemailaddress
Registered User
Posts: 161
Joined: Fri Mar 24, 2006 7:18 am

Post by abusedemailaddress »

set_time_limit is a function to try and disable the PHP execution time limit, if allowed, so that the script won't be interrupted. It has the @ prepended to it to prevent it from displaying an error to the user if it has been disabled.
The abuse will continue until morale improves!

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

Post by ezlynx »

Ah, very handy then. I notice sometimes posting takes quite a bit of time -- would likely time out if there was a limit.

I have a hard time conceptualizing the arrays. Do you know which of the arrays listed is the last array before it has to start inserting words? Any idea what the code to check to see if it is empty would look like?

ezlynx
Registered User
Posts: 81
Joined: Tue Sep 10, 2002 8:34 pm

FIXED!!

Post by ezlynx »

Okay. Got my son to look at the code. He inserted another case. Works like a charm!! I reported it as a bug and also added the fix in the bug forum.

Here is the fix:

Code: Select all

 #
#-----[ OPEN ]------------------------------------------
#

includes/functions_search.php


#
#-----[ FIND ]------------------------------------------
#

		else
		{
			$sql = "SELECT word_id
				FROM " . SEARCH_MATCH_TABLE . "
				GROUP BY word_id
				HAVING COUNT(word_id) > $common_threshold";
		}

#
#-----[ BEFORE, ADD ]------------------------------------------
#

		// ADDED: non-searchable post case
		else if($mode == 'single') { return; }
I've been using the altered file for a couple days and have not seen any queries that take longer than 15 seconds so it seems to be working. I routinely noticed queries that took over two and three hundred seconds so this is a big improvement.

You all likely have occasional posts that would be affected by this flaw. If your board sometimes gets slowed down, this should help.

Locked

Return to “2.0.x Support Forum”