[2.0.21] Rebuild Search

All new MODs released in our MOD Database will be announced in here. All support for released MODs needs to take place in here. No new MODs will be accepted into the MOD Database for phpBB2
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

On February 1, 2009 this forum will be set to read only as part of retiring of phpBB2.

Rating:

Excellent!
58
81%
Very Good
10
14%
Good
3
4%
Fair
0
No votes
Poor
1
1%
 
Total votes: 72

Snapdragon
Registered User
Posts: 85
Joined: Fri Apr 04, 2003 3:45 pm
Location: Edmonton, Alberta
Contact:

Post by Snapdragon »

OK, can we design a compromise.

What if I had it run while the forum was up in order to catch up on the first million... what I need is a way for the script to be less CPU intensive so that it will run and keep the forums going.

Would that be possible to add a NICE value to the script?

Then it could run over the course of a month and slowly rebuild.
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

Snapdragon wrote: OK, can we design a compromise.

What if I had it run while the forum was up in order to catch up on the first million... what I need is a way for the script to be less CPU intensive so that it will run and keep the forums going.

Would that be possible to add a NICE value to the script?

Then it could run over the course of a month and slowly rebuild.


There is a keyword "LOW_PRIORITY" (in MySQL) that you can use in the php mysql search queries, but i haven't tried it so i don't more details.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
foilzone.com
Registered User
Posts: 6
Joined: Tue Sep 07, 2004 12:26 am
Location: the Hague, the Netherlands
Contact:

Post by foilzone.com »

Chatasos,

I've just started a rebuild of my searchtables.

My board (www.foilzone.com/phpbb2) has these statistics before rebuilding the search tables:

Number of posts: 13811 Posts per day: 12.54
Number of topics: 2771 Topics per day: 2.52
Number of users: 801 Users per day: 0.73
Board started: Wed Nov 06, 2002 12:00 am Avatar directory size: 749.92 KB

Database size: 22.74 MB Gzip compression: OFF


Now, after a while of rebuilding search tables, using various settings for timeouts, amount of posts per session etc, it keeps time-outing.

If i set the post counter to 1 per session, and a time out of 60 seconds it keeps running more or less. Only does 1 post per go.

Think i've found a bug too by the way... The report on screen mentions "processing next 10 posts" which is the amount i've selected for the previous session, which timed out. It apparently displays the values for the previous session...

Nevertheless. My database reports 13811 posts. On average, how big would the search-tables grow into then?

Now the MOD reports an expected database size of 8 GB!!!!!!! If that's correct my server will explode!!!!

The report screen:

Rebuild Search Progress
Processed post ids : 6729 - 6729

Timer expired at 61 secs. Processing next 10 post(s). Please wait...

Processing post details
Processed Posts Percent
Current Session from 9 to 9 (out of total 8541) 0.11 % completed

Total from 5280 to 5280 (out of total 13812) 38.23 % completed

Processing time details
Processing time
Last 1 post(s) of current session 00 days, 00 hours, 01 minutes, 01 seconds
From the beginning of current session 00 days, 00 hours, 05 minutes, 52 seconds
Average per cycle of current session 00 days, 00 hours, 00 minutes, 59 seconds
Estimated until finish of current session 03 days, 20 hours, 41 minutes, 36 seconds

Database size details
Current Estimated after finish
Search Tables size 870.28 MB 8873.20 MB
Database size 892.52 MB 8895.44 MB

Active parameters
Starting post_id 6720
Processed post(s) on last cycle 1
Time limit 60
Board status Enabled

(*) All the estimated values are calculated approximately
based on the current completed percent and may not represent the actual final values. As the completed percent increases the estimated values will come closer to the actual ones.


Is this in any way normal????? I must delete quite some search words then, in order to keep database size down in an acceptible way...

Acceptible would be around 1 gb maximum, but EIGHT is too much.

I see people with larger amounts of posts, but with smaller databases.

I read some little remark in the MOD's docs about search_stopwords.txt but i can't seem to find this very file. I assume it must be found somewhere in the Phpbb2 folder or subfolders.

How does this work, what can i do to decrease database size?

Any help would be greatly appreciated!! My board soon offers searchability, but no longer usabilty cuz the db is bigger then my hosting account.

Thanks in advance all!!
Have fun!
Mark


Happy kiting, whatever you fly!

foilzone.com -------------- foils.nl

All we do is foilkites!
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

foilzone.com, can you provide me with a screenshot of the parameters you put before starting the processing?

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
foilzone.com
Registered User
Posts: 6
Joined: Tue Sep 07, 2004 12:26 am
Location: the Hague, the Netherlands
Contact:

Post by foilzone.com »

I can, but rest assured: i used first the default value of 50 posts, then decreased the value untill the amount of to be rebuilt posts was 10.

Then it didn't seem to time out that frequent.

Time out set to 300, as default. Refresh 3.

Now i've edited the stopwords with your trick, and i can even do 300 posts in those 300 seconds / refresh 3 seconds. No problems.

So, if the stopwords table is default, the search rebuild will be heavily over-pressured. If you put in all "common" frequent words (my board is about kitesurfing, so the word "kite" was heavy. Etc.

Seems to be running just fine now. :P :D

Rebuild Search Progress
Processed post ids : 4129 - 4150

Timer expired at 58 secs. Processing next 50 post(s). Please wait...

Processing post details
Processed Posts Percent
Current Session from 3600 to 3618 (out of total 13820) 26.18 % completed

Total from 3600 to 3618 (out of total 13821) 26.18 % completed

Processing time details
Processing time
Last 19 post(s) of current session 00 days, 00 hours, 00 minutes, 58 seconds
From the beginning of current session 00 days, 01 hours, 20 minutes, 27 seconds
Average per cycle of current session 00 days, 00 hours, 00 minutes, 45 seconds
Estimated until finish of current session 00 days, 03 hours, 46 minutes, 51 seconds

Database size details
Current Estimated after finish
Search Tables size 229.94 MB 878.32 MB
Database size 252.20 MB 900.58 MB

Active parameters
Starting post_id 6
Processed post(s) on last cycle 19
Time limit 60
Board status Enabled


Let's wait and see! :-)
Have fun!
Mark


Happy kiting, whatever you fly!

foilzone.com -------------- foils.nl

All we do is foilkites!
Snapdragon
Registered User
Posts: 85
Joined: Fri Apr 04, 2003 3:45 pm
Location: Edmonton, Alberta
Contact:

Post by Snapdragon »

chatasos wrote:
Snapdragon wrote:OK, can we design a compromise.

What if I had it run while the forum was up in order to catch up on the first million... what I need is a way for the script to be less CPU intensive so that it will run and keep the forums going.

Would that be possible to add a NICE value to the script?

Then it could run over the course of a month and slowly rebuild.


There is a keyword "LOW_PRIORITY" (in MySQL) that you can use in the php mysql search queries, but i haven't tried it so i don't more details.


How would I go about integrating that into the search? I would have to edit the code in the file itself, would I not?
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

Snapdragon wrote:
chatasos wrote:
Snapdragon wrote:OK, can we design a compromise.

What if I had it run while the forum was up in order to catch up on the first million... what I need is a way for the script to be less CPU intensive so that it will run and keep the forums going.

Would that be possible to add a NICE value to the script?

Then it could run over the course of a month and slowly rebuild.


There is a keyword "LOW_PRIORITY" (in MySQL) that you can use in the php mysql search queries, but i haven't tried it so i don't more details.


How would I go about integrating that into the search? I would have to edit the code in the file itself, would I not?

You would have to edit phpbb's file "includes/functions_search.php" which contains the 2 functions (add_search_words, remove_search_post) this mod uses.

PS: If i find free time for a new version, i'll try to add some optimizations.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
Snapdragon
Registered User
Posts: 85
Joined: Fri Apr 04, 2003 3:45 pm
Location: Edmonton, Alberta
Contact:

Post by Snapdragon »

Something else I meant to add, if you do another version:

If the forum is disabled already, it should not be re-enabled after the end of a search. It should only re-enable if the rebuild search MOD is used to disable it in the first place.
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

Snapdragon wrote: Something else I meant to add, if you do another version:

If the forum is disabled already, it should not be re-enabled after the end of a search. It should only re-enable if the rebuild search MOD is used to disable it in the first place.


I believe that is already happening :wink:

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
Snapdragon
Registered User
Posts: 85
Joined: Fri Apr 04, 2003 3:45 pm
Location: Edmonton, Alberta
Contact:

Post by Snapdragon »

Nope. I disabled by hand and the mod turned the forum back on on me twice, hence why I posted about it.

I just upgraded to MySQL 4.1.15 from 3.23.58, I'm going to see if it makes a difference in the time it takes, since 4 takes advantage of memory caching and I have 4 GB to throw at it.
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

Snapdragon wrote: Nope. I disabled by hand and the mod turned the forum back on on me twice, hence why I posted about it.


If you have disabled your board through the admin panel, then the mod's "disable board" option should be disabled (greyed out), so you cannot change it from there.
Are you sure that some other admin didn't enable it while you're running the mod?

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
belzecue2
Registered User
Posts: 13
Joined: Sun Nov 20, 2005 4:57 pm
Location: Australia
Contact:

A replication fly in the msaccess ointment

Post by belzecue2 »

Looks to be a fantastic mod, however...

Your assumption that table keys are sequential integers starting at 1 resulted in the mod failing for my db.

I use an msaccess db. That in itself would not be a problem -- I bet this mod works fine with default msaccess phpbb installations.

The problem is, I applied replication to my db. That process alters all 'auto increment' fields to 'random' fields -- where ids get a random signed integer. That makes 'start=0' kinda meaningless :-)
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Re: A replication fly in the msaccess ointment

Post by chatasos »

belzecue2 wrote: Looks to be a fantastic mod, however...

Your assumption that table keys are sequential integers starting at 1 resulted in the mod failing for my db.

I use an msaccess db. That in itself would not be a problem -- I bet this mod works fine with default msaccess phpbb installations.

The problem is, I applied replication to my db. That process alters all 'auto increment' fields to 'random' fields -- where ids get a random signed integer. That makes 'start=0' kinda meaningless :-)


Hi belzecue2,

I think i cannot quite understand you problem.
You said that you applied replication to your db and that all auto-increment values were changed to random values. If this is the default behaviour of msaccess replication, shouldn't all other phpbb tables (posts, topics, etc.) have problems too? Many phpbb tables use auto-increment values.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
Snapdragon
Registered User
Posts: 85
Joined: Fri Apr 04, 2003 3:45 pm
Location: Edmonton, Alberta
Contact:

Post by Snapdragon »

chatasos wrote:
Snapdragon wrote:Nope. I disabled by hand and the mod turned the forum back on on me twice, hence why I posted about it.


If you have disabled your board through the admin panel, then the mod's "disable board" option should be disabled (greyed out), so you cannot change it from there.


Absolutely correct. It recognized the board was disabled, however, it still turned it back on when I cancelled the operation, twice.
chatasos wrote: Are you sure that some other admin didn't enable it while you're running the mod?


The one person who could have done that would have been asleep for many hours. :D No one messes with my operations.
chatasos
Registered User
Posts: 748
Joined: Wed May 15, 2002 1:16 pm
Location: Paralia

Post by chatasos »

Snapdragon wrote: Absolutely correct. It recognized the board was disabled, however, it still turned it back on when I cancelled the operation, twice.

Yep, you're right here. :wink:
If you cancel the process, the board gets enabled. It seems i forgot to add something...

Please try the following version:
[url=http:///www.psclub.gr/chatasos/rebuild_search/rebuild_search_2.2.1b.zip]rebuild_search_2.2.1b.zip[/url]

To update from 2.2.1a just overwrite file admin/admin_rebuild_search.php.

Report Posts 1.2.3c (MODDB) - Report Posts 2.1.5 (ALPHA)
Rebuild Search 2.4.0 (MODDB)
MOD Version Checker 1.2.0 (MODDB)
Mega Mail System 0.9.8 (ALPHA)
Pagination Select List & Input Box (MODDB)
Post Reply

Return to “[2.0.x] MOD Database Releases”