[2.0.x] Tweaks for large forums

The 2.0.x discussion forum has been locked; this will remain read-only. The 3.0.x discussion forum has been renamed phpBB Discussion.
Locked
msr2
Registered User
Posts: 172
Joined: Thu Jun 16, 2005 1:52 pm
Location: here
Contact:

Post by msr2 » Sun Aug 07, 2005 9:13 pm

jk1 wrote: I think he was for real, here is his forum: http://ian.go-gaia.com/forum/index.php


0.0 he owns gaia online? wow!

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Wed Aug 24, 2005 7:47 pm

Hi Xkevinx:

I like to install Cacti on our web servers for monitoring health, especially with the amount of servers we have now. Though in reality, seeing performance from one single machine is kind of a boring thing, and aggregated results will show similar results but with bigger numbers. :?

Server count wise we're at around 100 web servers. Database servers are at around 20. Maintaining them plus all the other networking equipment is a nightmare... At least our new web servers are diskless so it's less of a pain to install.

Search feature is something that boggles my mind. The more I looked into the implementation, the more reason I see why there are companies like Google that dedicate to text indexing. It's just beyond my grasp. :oops: I will implement a text searching feature for finding subject lines, however, that should be managable, and I think phpBB should have an option to only index subject lines also.
xkevinx wrote: Hi lanzer,

I fould this neato script that would be really interesting to see run on your system. Heck you may be running it already. it's a stats program. If not this one what on are you using if any.

http://cacti.net/

And how many servers are you at now lol.

EDIT: any luck finding a solution to the search feature.

Kevin (Fusionx at gaia)

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Re: Searching improvements for larger forums.

Post by lanzer » Thu Aug 25, 2005 12:07 am

Hi Serberus:

If you want more speed with the function, you can probably not worry about strtolower and make sure that your list of stopwords and synonyms are stored in lowercase when you enter them (run strtolower in the admin interface) and instead of running str_replace multiple times in a loop, store all the words in two arrays - "replace from" and "replace to", then perform the replace in one preg_replace function.

example below (note, code has not been tested!)

Code: Select all

$replace_from = array();
$replace_to = array();
if ( !empty($stopword_list) ) {
    foreach ($stopword_list as $stopword) {
        // $stopword = strtolower(trim($stopword));

        if ( $mode == 'post' || !in_array($stopword, array('not','and','or')) ) {
            $replace_from[] = "/ $stopword /";
            $replace_to[] = ' ';
        }
    }
}

if ( !empty($synonym_list) ) {
    foreach ($synonym_list as $synonym) {
        list($replace_synonym, $match_synonym) = explode(' ', trim(strtolower($synonym)));
        // $replace_synonym = strtolower($replace_synonym);
        // $match_synonym   = strtolower($match_synonym);

        if ( $mode == 'post' || !in_array($match_synonym, array('not','and','or')) ) {
            $replace_from[] = "/ $match_synonym /";
            $replace_to[] = " $replace_synonym ";
        }
    }
}

if (count($replace_from)) $entry = preg_replace($replace_from,$replace_to,$entry);
Serberus wrote: Nice work barbos. Does the code below offer any performance improvement?

Code: Select all

	if ( !empty($stopword_list) ) {
		foreach ($stopword_list as $stopword) {
			$stopword = strtolower(trim($stopword));

			if ( $mode == 'post' || !in_array($stopword, array('not','and','or')) )
				$entry = str_replace(" $stopword ",' ',$entry);
		}
	}

	if ( !empty($synonym_list) ) {
		foreach ($synonym_list as $synonym) {
			list($replace_synonym, $match_synonym) = explode(' ', trim(strtolower($synonym)));
			$replace_synonym = strtolower($replace_synonym);
			$match_synonym   = strtolower($match_synonym);

			if ( $mode == 'post' || !in_array($match_synonym, array('not','and','or')) )
				$entry = str_replace(" $match_synonym ", " $replace_synonym ", $entry);
		}
	}

lanzer
Registered User
Posts: 152
Joined: Wed Oct 10, 2001 10:00 am
Contact:

Post by lanzer » Thu Aug 25, 2005 12:49 am

Hi Gulson:

Yes, quite a while back I had cronjobs which ran through posts adding them to the search table to minimize on the wait it takes to post. Though eventually there were so many posts that the system couldn't keep up with the updates. Eventually those cronjobs were scratched also.

My conclusion is that if a forum is getting big enough to the point where adding entries to the search engine takes too long, then the alternative really should be to have Google crawl through your pages. Another alternative is to keyword only the subject lines. Stuff like putting the keyword routine under a cronjob is only a temporary solution.
gulson wrote: I have read a lot about problems (heavy load) phpbb search engine. I saw many seconds are waste for update search engine tables. This is happening when someone send new message/reply also when message are removed. So when user is sending new message - he must wait few seconds - sometimes need more seconds or information about stored message is never displayed... after that we have multiple the same posts.. Lanzer had an idea about make tmp table. This table should keep information what changes should be made in phpbb_search tables. So changes in search tables are making only twice per day when not many users are online. Maybe someone can say more about this idea or maybe release what code changes and which files should be edited and what rows created... Thank you.

younghistorians
Registered User
Posts: 368
Joined: Mon Sep 16, 2002 9:17 pm
Location: YoungHistorians.com
Contact:

Post by younghistorians » Thu Aug 25, 2005 2:34 am

lanzer,
Have you considered a Google search appliance? Their "Mini" runs around $3000 and might be something you would be interested in? I don't know how Gaia justifies it's expenses, but due to your size it seems like a logical thing to do.

Then again, your primary audience is young and it seems the search function hasn't been terribly missed; that and you're not searching through medical journals--just forum posts.

-YH

XxDawgxX
Registered User
Posts: 10
Joined: Tue Aug 23, 2005 12:28 am
Location: Texas
Contact:

Post by XxDawgxX » Fri Aug 26, 2005 3:20 am

Hi lanzer,


This is one of the best threads I have ever seen on the internet. I say that with lots of experience of being a member of many boards since 1999. In those days all the forums were written in Perl. What makes this thread so special is not the code tweaks you suggest or the Mysql hints. Let me elaborate, Back in 1999 I started a site and within an about a month I started getting about 10.000 unique visitors a day. The site eventually grew quite large and to this day if you type in the name of the domain you can still find links to it even though the site has been dead for 2 years.


The site eventually slowed down because the content became irrelevant to modern content. Since those day’s I have tried to duplicate the success of that site to no avail. I have owned about 100 domain names since then without one of them even reaching a 10th of the traffic that site had. Eventually I grew weary of trying and trying and failing with every attempt. I put my hopes on the backburner for a long time. I went back to school and got my degree. I am now a network administrator for 4 small school districts. I make good money doing it, but you know what? My real passion is still the internet.


I got the itch again to build another forum, so of course I came here for the scripts. While looking through the forum I came across this thread. Lanzer, what touched me about this thread was not the tweaks or code those can be worked out. What got me was the passion of what you are doing, and the selflessness of your sharing with others in the community. You gave me hope again of maybe someday being successful again in an online venture, because like you that is where my passion lies. So you see, by seeing your success and passion you have touched others and gave them hope. I would just like to say this is one of the best communities on the net. The people here are willing to help when you are in trouble. That is what makes PhpBB so successful in my opinion.
Keep up the good work and good luck.

R45
Registered User
Posts: 2830
Joined: Tue Nov 27, 2001 10:42 pm

Post by R45 » Fri Aug 26, 2005 1:01 pm

lanzer wrote: My conclusion is that if a forum is getting big enough to the point where adding entries to the search engine takes too long, then the alternative really should be to have Google crawl through your pages. Another alternative is to keyword only the subject lines. Stuff like putting the keyword routine under a cronjob is only a temporary solution.

Lanzer,

I wouldn't say SQL based search solutions are a complete lost cause. For forums approaching one million posts, cronjob'ing the existing search system goes a long way (Many forums take years to reach near the mark). Most forums tend to hover around the two to five million posts mark. I've been working on a system recently to tackle this bracket, which involves breaking the index down into multiple (hundreds) of tables as well as using memcached to store results. The results so far have been very promising. It won't work with a site anywhere near Gaia, you're pretty much an extreme example :P

Lady Serena
Registered User
Posts: 76
Joined: Thu May 26, 2005 12:31 am
Location: Smithsburg, MD
Contact:

Post by Lady Serena » Fri Aug 26, 2005 4:10 pm

R45 wrote:
lanzer wrote:My conclusion is that if a forum is getting big enough to the point where adding entries to the search engine takes too long, then the alternative really should be to have Google crawl through your pages. Another alternative is to keyword only the subject lines. Stuff like putting the keyword routine under a cronjob is only a temporary solution.

Lanzer,

I wouldn't say SQL based search solutions are a complete lost cause. For forums approaching one million posts, cronjob'ing the existing search system goes a long way (Many forums take years to reach near the mark). Most forums tend to hover around the two to five million posts mark. I've been working on a system recently to tackle this bracket, which involves breaking the index down into multiple (hundreds) of tables as well as using memcached to store results. The results so far have been very promising. It won't work with a site anywhere near Gaia, you're pretty much an extreme example :P

Hi R45, I'm just a member on Gaia, and there's a lot more than just five or six million posts. I just looked at the forum index (which I do frequently on Gaia) and there's over 300 million posts.
Gaia Online Forum Index wrote: Gaia has 363,725,946 articles posted with 2,437,084 registered users.

High-performance searching is a must when there are so many messages in the system. Personally, I think the best search system for Gaia would be a couple of those Google Site Search machines, which are very expensive, and a very low-key solution for Gaia (up to 5 million documents). A dedicated search system like Google or Teoma would benefit Gaia a great deal, but its also more hardware, more software, and more network infrastructure on the back-end network.

Bottom line: When you have over 300 million searchable documents (posts), only the best and most expensive search systems will perform.
Varus Online :: Discover Imagination
http://www.varusonline.com/

R45
Registered User
Posts: 2830
Joined: Tue Nov 27, 2001 10:42 pm

Post by R45 » Fri Aug 26, 2005 4:46 pm

Lady Serena wrote: Hi R45, I'm just a member on Gaia, and there's a lot more than just five or six million posts. I just looked at the forum index (which I do frequently on Gaia) and there's over 300 million posts.
Gaia Online Forum Index wrote:Gaia has 363,725,946 articles posted with 2,437,084 registered users.

High-performance searching is a must when there are so many messages in the system. Personally, I think the best search system for Gaia would be a couple of those Google Site Search machines, which are very expensive, and a very low-key solution for Gaia (up to 5 million documents). A dedicated search system like Google or Teoma would benefit Gaia a great deal, but its also more hardware, more software, and more network infrastructure on the back-end network.

Bottom line: When you have over 300 million searchable documents (posts), only the best and most expensive search systems will perform.

That's why I said for forums within the 2 - 5 million post bracket, SQL solutions can still apply however not beyond that.
R45 wrote: It won't work with a site anywhere near Gaia, you're pretty much an extreme example :P

chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos » Fri Aug 26, 2005 5:24 pm

Hi to everyone....

I'm watching this very interesting conversation from the beginning and to be honest i have learned a lot from your ideas.

On the last posts you have been talking a lot about the search functions in phpbb and since i'm developing a mod (Rebuild Search) to rebuild the search tables, i would be vary grateful if people with large databases could give it a try.

The mod, besides its basic functionality to rebuild the search tables, is estimating (using very simple algorithms :cry: ) processing time and db size during its progress.

Since now i have processed my own forum db which is about 100.000 posts and it took almost 2 days (!) to complete. My forum is on a hosted/shared server and maybe a dedicated one could make quite a difference decreasing that time.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)

da_badtz_one
Registered User
Posts: 376
Joined: Thu Jan 29, 2004 8:25 pm

Post by da_badtz_one » Sat Aug 27, 2005 10:20 am

I've used this code before which regenerated my forum of about 70K posts in 4 to 6 hours. There was a problem with this script which is that the status never updated which was quite confusing.

But if someone could rewrite it or fix those bugs I think it would benefit everyone running big forums on dedicated servers which someday needed a server move, or a rework in the search tables. This script is very cpu extensive and it was written by someone else.


Code: Select all

#!/usr/bin/php
<?php
/***************************************************************************
 *                            admin_rebuild.php
 *                            -------------------
 *   time                 : April 14th, 2005
 *   author               : barbos
 *   version              : 1.0
 *   
 **************************************************************************/

/***************************************************************************
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 ***************************************************************************/
 
$timestart = microtime();
$common = array();
 
function clean_words($mode, &$entry, &$stopword_list, &$synonym_list)
{
	static $drop_char_match =   array('^', '$', '&', '(', ')', '<', '>', '`', '\'', '"', '|', ',', '@', '_', '?', '%', '-', '~', '+', '.', '[', ']', '{', '}', ':', '\\', '/', '=', '#', '\'', ';', '!');
	static $drop_char_replace = array(' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  '',   ' ', ' ', ' ', ' ', '',  ' ', ' ', '',  ' ',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ' , ' ', ' ', ' ', ' ',  ' ', ' ');

	for($i = 0; $i < count($entry); $i++)
	{
		$entry[$i] = ' ' . strip_tags(strtolower($entry[$i])) . ' ';
	}

	if ( $mode == 'post' )
	{
		// Replace line endings by a space
		$entry = preg_replace('/[\n\r]/is', ' ', $entry); 
		// HTML entities like &nbsp;
		$entry = preg_replace('/\b&[a-z]+;\b/', ' ', $entry); 
		// Remove URL's
		$entry = preg_replace('/\b[a-z0-9]+:\/\/[a-z0-9\.\-]+(\/[a-z0-9\?\.%_\-\+=&\/]+)?/', ' ', $entry); 
		// Quickly remove BBcode.
		$entry = preg_replace('/\[img:[a-z0-9]{10,}\].*?\[\/img:[a-z0-9]{10,}\]/', ' ', $entry); 
		$entry = preg_replace('/\[\/?url(=.*?)?\]/', ' ', $entry);
		$entry = preg_replace('/\[\/?[a-z\*=\+\-]+(\:?[0-9a-z]+)?:[a-z0-9]{10,}(\:[a-z0-9]+)?=?.*?\]/', ' ', $entry);
	}
	else if ( $mode == 'search' ) 
	{
		$entry = str_replace(' +', ' and ', $entry);
		$entry = str_replace(' -', ' not ', $entry);
	}

	//
	// Filter out strange characters like ^, $, &, change "it's" to "its"
	//
	for($i = 0; $i < count($drop_char_match); $i++)
	{
		$entry =  str_replace($drop_char_match[$i], $drop_char_replace[$i], $entry);
	}

	if ( $mode == 'post' )
	{
		$entry = str_replace('*', ' ', $entry);

		// 'words' that consist of <3 or >20 characters are removed.
		//$entry = preg_replace('/[ ]([\S]{1,2}|[\S]{21,})[ ]/',' ', $entry);
	}

	if ( !empty($stopword_list) )
	{
		for ($j = 0; $j < count($stopword_list); $j++)
		{
			$stopword = trim($stopword_list[$j]);

			if ( $mode == 'post' || ( $stopword != 'not' && $stopword != 'and' && $stopword != 'or' ) )
			{
				$entry = str_replace(' ' . trim(strtolower($stopword)) . ' ', ' ', $entry);
			}
		}
	}

	if ( !empty($synonym_list) )
	{
		for ($j = 0; $j < count($synonym_list); $j++)
		{
			list($replace_synonym, $match_synonym) = split(' ', trim(strtolower($synonym_list[$j])));
			if ( $mode == 'post' || ( $match_synonym != 'not' && $match_synonym != 'and' && $match_synonym != 'or' ) )
			{
				$entry =  str_replace(' ' . trim(strtolower($match_synonym)) . ' ', ' ' . trim(strtolower($replace_synonym)) . ' ', $entry);
			}
		}
	}

	return $entry;
}

function split_words(&$entry, $mode = 'post')
{
	// Trim 1+ spaces to one space and split this trimmed string into words.
	return explode(' ', trim(preg_replace('#\s+#', ' ', $entry)));
}

function add_search_words($mode, $post_id, $post_text, $post_title = '', &$masterwords)
{
	global $db, $phpbb_root_path, $board_config, $lang, $common, $maxcommon;

	$search_raw_words = array();
	$search_raw_words['text'] = split_words($post_text);
	$search_raw_words['title'] = split_words($post_title);

	@set_time_limit(0);

	$word = array();
	$word_insert_sql = array();
	while ( list($word_in, $search_matches) = @each($search_raw_words) )
	{
		$word_insert_sql[$word_in] = array();
		$word_insert_sql2 = '';
		if ( !empty($search_matches) )
		{
			for ($i = 0; $i < count($search_matches); $i++)
			{ 
				$search_matches[$i] = trim($search_matches[$i]);
				$searchlen = strlen($search_matches[$i]);

				if( $search_matches[$i] != '' && ($searchlen > 2 && $searchlen < 21)) 
				{
					$word[] = $search_matches[$i];
					if ( !strstr($word_insert_sql2[$word_in], "'" . $search_matches[$i] . "'") )
					{
						$word_insert_sql[$word_in][] = $search_matches[$i];
						$word_insert_sql2[$word_in] .= ( $word_insert_sql2[$word_in] != "" ) ? ", '" . $search_matches[$i] . "'" : "'" . $search_matches[$i] . "'";
					}
				} 
			}
		}
	}

	if ( count($word) )
	{
		sort($word);

		$p = 0;

		for($i = 0; $i < count($word); $i++)
		{
			if (empty($masterwords[$word[$i]]))
			{
				$masterwords[$word[$i]] = -1;
				$word2[$p] = $word[$i];
				$p++;
			}
		}
		
		$word = $word2;

		$prev_word = '';
		$word_text_sql = '';
		$temp_word = array();
		for($i = 0; $i < count($word); $i++)
		{
			if ( $word[$i] != $prev_word )
			{
				$temp_word[] = $word[$i];
			}
			$prev_word = $word[$i];
		}
		$word = $temp_word;

		for ($i = 0; $i < count($word); $i++)
		{ 
			$new_match = true;
			$errortrap = false;

			if ( $new_match )
			{
				switch( SQL_LAYER )
				{
					case 'mysql':
					case 'mysql4':
					case 'mssql':
					case 'mssql-odbc':
					default:
						$sql = "INSERT INTO " . SEARCH_WORD_TABLE . " (word_text, word_common) 
							VALUES ('" . $word[$i] . "', 0)"; 
						if( !$db->sql_query($sql) )
						{
							//message_die(GENERAL_ERROR, 'Could not insert new word', '', __LINE__, __FILE__, $sql);
							$errortrap = true;
						}
						break;
				}
			}

			if ($errortrap == true)
			{
				$sql = "SELECT word_id FROM " . SEARCH_WORD_TABLE . " WHERE word_text = '" . $word[$i] . "';";
				
				$result = $db->sql_query($sql);
				if ( !$result )
				{
					message_die(GENERAL_ERROR, "Could not find posts.", "",__LINE__, __FILE__, $sql);
				}
				
				$wordresult = $db->sql_fetchrow($result);
				$masterwords[$word[$i]] = $wordresult["word_id"];
			}
			else
			{
				$masterwords[$word[$i]] = $db->sql_nextid();
				if ($masterwords[$word[$i]] == 0)
				{
					$sql = "SELECT word_id FROM " . SEARCH_WORD_TABLE . " WHERE word_text = '" . $word[$i] . "';";
					
					$result = $db->sql_query($sql);
					if ( !$result )
					{
						message_die(GENERAL_ERROR, "Could not find posts.", "",__LINE__, __FILE__, $sql);
					}
					
					$wordresult = $db->sql_fetchrow($result);
					$masterwords[$word[$i]] = $wordresult["word_id"];
				}
			}
		}
	}

	while( list($word_in, $match_sql) = @each($word_insert_sql) )
	{
		$title_match = ( $word_in == 'title' ) ? 1 : 0;

		if ( $match_sql != '' )
		{
			for($z = 0; $z < count($match_sql); $z++)
			{
				$common[$match_sql[$z]]++;

				if ($common[$match_sql[$z]] > $maxcommon)
				{
					if ($masterwords[$match_sql[$z]] > 0)
					{
						remove_common($masterwords[$match_sql[$z]]);
						$masterwords[$match_sql[$z]] = -1;
					}
				}
				else
				{
					$sql = "INSERT INTO " . SEARCH_MATCH_TABLE . " VALUES (" . $post_id . "," . $masterwords[$match_sql[$z]] . "," . $title_match . ");";

					if ( !$db->sql_query($sql) )
					{
						message_die(GENERAL_ERROR, 'Could not insert new word', '', __LINE__, __FILE__, $sql);
					}
				}
			}
		}
	}

	return;
}

//
// Check if specified words are too common now
//
function remove_common($word)
{
	global $db;

			$sql = "UPDATE " . SEARCH_WORD_TABLE . "
				SET word_common = " . TRUE . " 
				WHERE word_id IN ('$word')";
			if ( !$db->sql_query($sql) )
			{
				message_die(GENERAL_ERROR, 'Could not delete word list entry', '', __LINE__, __FILE__, $sql);
			}

			$sql = "DELETE FROM " . SEARCH_MATCH_TABLE . "  
				WHERE word_id IN ('$word')";
			if ( !$db->sql_query($sql) )
			{
				message_die(GENERAL_ERROR, 'Could not delete word match entry', '', __LINE__, __FILE__, $sql);
			}
}

define('IN_PHPBB', 1);

//
// Include required files, get $phpEx and check permissions
//
$phpbb_root_path = "./../";
require($phpbb_root_path . 'extension.inc');
include($phpbb_root_path . 'common.'.$phpEx);
	
// Empty wordlist tables
$sql = "DELETE FROM " . SEARCH_WORD_TABLE;
$result = $db->sql_query($sql);
if ( !$result )
{
	message_die(GENERAL_ERROR, "Could not empty search_wordlist table.", "",__LINE__, __FILE__, $sql);
}
$sql = "DELETE FROM " . SEARCH_MATCH_TABLE;
$result = $db->sql_query($sql);
if ( !$result )
{
	message_die(GENERAL_ERROR, "Could not empty search_wordmatch table.", "",__LINE__, __FILE__, $sql);
}

$sql = "SELECT count(*) as pcount FROM " . POSTS_TEXT_TABLE;

$result = $db->sql_query($sql);
if ( !$result )
{
	message_die(GENERAL_ERROR, "Could not find posts.", "",__LINE__, __FILE__, $sql);
}

$postcount = $db->sql_fetchrow($result);
$batch = array();
$masterwords2 = array();
$a = 0;
$i = 0;
$maxcommon = $postcount["pcount"] * (1/25);

$stopword_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_stopwords.txt"); 
$synonym_array = @file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_synonyms.txt"); 

$sql = "SELECT * FROM " . POSTS_TEXT_TABLE;

$result = $db->sql_query($sql);
if ( !$result )
{
	message_die(GENERAL_ERROR, "Could not find posts.", "",__LINE__, __FILE__, $sql);
}

while ( $activepost = $db->sql_fetchrow($result) )
{

	if ($i % 100 == 0 && $i > 0)
	{

		clean_words('post', $batch['post_text'], $stopword_array, $synonym_array);
		clean_words('post', $batch['post_subject'], $stopword_array, $synonym_array);

		for($j = 0; $j < count($batch['post_id']); $j++)
		{
			add_search_words('single', $batch['post_id'][$j], stripslashes($batch['post_text'][$j]), stripslashes($batch['post_subject'][$j]), $masterwords2);
		}

		unset($batch);

		$batch = array();
		$a = 0;
		
		if ($i % 500 == 0 && $i > 0)
		{
			print $i . " records indexed\n";
		    
			$timeend = microtime();
			print number_format(((substr($timeend,0,9)) + (substr($timeend,-10)) - (substr($timestart,0,9)) - (substr($timestart,-10))),4) . " seconds elapsed. \n\n";
			flush();
		}
	}

	$batch['post_id'][$a] = $activepost['post_id'];
	$batch['post_text'][$a] = $activepost['post_text'];
	$batch['post_subject'][$a] = $activepost['post_subject'];

	$a++;
	$i++;
	
	
}

clean_words('post', $batch['post_text'], $stopword_array, $synonym_array);
clean_words('post', $batch['post_subject'], $stopword_array, $synonym_array);

for($j = 0; $j < count($batch['post_id']); $j++)
{
	add_search_words('single', $batch['post_id'][$j], stripslashes($batch['post_text'][$j]), stripslashes($batch['post_subject'][$j]), $masterwords2);
}

print $i . " records indexed. Done.\n\n";

$timeend = microtime();
print number_format(((substr($timeend,0,9)) + (substr($timeend,-10)) - (substr($timestart,0,9)) - (substr($timestart,-10))),4) . " seconds elapsed.";
?>

chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos » Sat Aug 27, 2005 11:20 am

da_badtz_one wrote: I've used this code before which regenerated my forum of about 70K posts in 4 to 6 hours. There was a problem with this script which is that the status never updated which was quite confusing.

But if someone could rewrite it or fix those bugs I think it would benefit everyone running big forums on dedicated servers which someday needed a server move, or a rework in the search tables. This script is very cpu extensive and it was written by someone else.


Hi da_badtz_one,

As i can see the above script is using the default phpbb search functions like mine (as most search rebuild scripts do), so processing times will be around the same values.
Maybe you could try my mod and post some results too. Somone else who tried it, processed 40K posts in 40 mins, but his posts are quite shorter than mine.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)

da_badtz_one
Registered User
Posts: 376
Joined: Thu Jan 29, 2004 8:25 pm

Post by da_badtz_one » Mon Aug 29, 2005 12:04 am

What results would you want to know about? I'm willing to spend some time to rebuild all the search results on my localhost with a back up copy of my site on it :)

chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos » Mon Aug 29, 2005 12:57 am

da_badtz_one wrote: What results would you want to know about? I'm willing to spend some time to rebuild all the search results on my localhost with a back up copy of my site on it :)


If you take a look at http://www.phpbb.com/phpBB/viewtopic.php?t=318363 you'll see what others have posted... :wink:
Thanks very much for your time..

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)

zemez_man
Registered User
Posts: 2
Joined: Wed Aug 17, 2005 12:56 am
Contact:

ohh this idea is great

Post by zemez_man » Wed Aug 31, 2005 9:08 am

ohh this idea is great ,,, im really going to use this..

Locked

Return to “2.0.x Discussion”