MOD Search_partial_display

This forum is now closed as part of retiring phpBB2.
Forum rules
READ: phpBB.com Board-Wide Rules and Regulations

This forum is now closed due to phpBB2.0 being retired.
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

Merlin Sythove wrote: No, can't, 25 chars works fine here....


Did 25 chars work fine even when you used my new code lines? And even when you do a search that returns many pages of results?
Merlin Sythove
Registered User
Posts: 2339
Joined: Tue Mar 16, 2004 7:42 am

Post by Merlin Sythove »

Yes works fine, but like I said, my board is non-standard.
The amount of pages makes no difference.
I removed the <td> and <tr> removal though because that will mess up the screen layout. You should only remove <table(....)</table>, not individual rows or cells from a table that you may actually be in the middle of.
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

Well, I haven't tried this yet, but to change those code lines so that they remove table tags and anything in between them rather than td or tr tags, you could try this:

Code: Select all

FIND
							// now remove any tags that begin with a td or tr (e.g. for quote or code blocks)
							$tail = preg_replace('/<td([^<>]*?)>/', '', $tail);
							$tail = preg_replace('/<tr([^<>]*?)>/', '', $tail);

REPLACE WITH
							// now remove any pair of table open and close tags in the tail and everything in between them
							$tail = preg_replace('/<table(.*?)>(.*?)<\/table>/', '', $tail);
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

Stiill not perfect, since that will leave tables in the tail (including tds and trs) that began in the head.

I continue to think the way to do this best is to take out the trs and tds in the tail, but in order to assure I don't mess the page layout up up by deleting a tag without a corresponding closed tag, I'm now trying to delete any pair of open and close trs or open and close tds in the tail and everything in between them, which is easy to do using these regex:

Code: Select all

							$tail = preg_replace('/<td(.*?)>(.*?)<\/td>/', '', $tail);
							$tail = preg_replace('/<tr(.*?)>(.*?)<\/tr>/', '', $tail);
That seems to work perfectly except that once again, if I try to search and return 25 chars or 0 characters, it messes up the layout of the page. However, if I keep to 100 chars or more, it never ever creates any problem. Any idea why that happens?

For debugging purpopses, I'd love to echo exactly what the head and tail look like before and after the final processing, but since it echoes into a browser the html disappears. Is there an echo command that makes the browser display rather than process all html stuff?
Merlin Sythove
Registered User
Posts: 2339
Joined: Tue Mar 16, 2004 7:42 am

Post by Merlin Sythove »

Look at the source code of the page in your browser, and compare it with what it should be (the single post that is messing the rest up, of course). I'm sure you're removing rows or columns or table tags if the screen gets messed up.

I'm sure there are ways to reliably remove cells and rows from the rest of a table, checking all the way that you don't mess up the cell layout. I'm sure that will get quite complicated too...
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

Merlin Sythove wrote: Look at the source code of the page in your browser, and compare it with what it should be (the single post that is messing the rest up, of course). I'm sure you're removing rows or columns or table tags if the screen gets messed up.

I'm sure there are ways to reliably remove cells and rows from the rest of a table, checking all the way that you don't mess up the cell layout. I'm sure that will get quite complicated too...


My regex should only take out tds and /tds (and trs and /trs) in pairs (and take out whatever comes in between), so I'm not really sure why that would mess up anything. Odder still, I have tested this quite a bit (using searches that bring back many many pages of hits so that I can be sure that I am really seeing this operate in a variety of circumstances), and it is totally reliable so long as I keep the minmum number of characters to 100 or more. So it seems it only messes up when there is very little in $head. Does that give you any ideas?

I did start to use view source last night to try to track this down, but it's a bit like finding a needle in a haystack...I should be able to track down the issue but it will take a while. Meanwhile, I'll be out of town for about 10 days so I won't be experimenting with this for a while.
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

OK, Merlin, uncle. Taking out trs and tds or any other solution we have talked about so far is simply too dicey given how complex the tables become in these posts. So I have gone back to the solution where we strip class="quote" and class="code" out of the tail (and I edited the prettied up version of your mod that I originally posted on page 1 of this thread to pick that change up).

It looks pretty good (huge improvement from the partial post search behavior of an unmodded phpbb board). You do end up with some extra skipped lines that appear at the end of any post where there are quote or code blocks beyond the characters the user has asked to see, but if the user is asking to return a reasonable number of characters it looks fine.
Merlin Sythove
Registered User
Posts: 2339
Joined: Tue Mar 16, 2004 7:42 am

Post by Merlin Sythove »

OK, I've amended my own first post as well.

If you want to have a final attempt to get rid of empty quote and code blocks, here are some ideas (I may have a go myself some day).

They are in fact as you can see from the bbcode.tpl file tables in their own right. The problem is that they may be nested, and that the final </table> that closes a quote in the end of the $tail, may belong to an opening <table that is in $head. Removing that closing </table> will get you into trouble with the screen layout.

Normal procedure with nested stuff is that you create a "stack". On this stack you put each opening html code that you find. And when you find the corresponding closing html code, you take the code off the top of the stack again. If the codes really match, the top of the stack will have the opening html code at the moment you find the closing html code in your source.

If the stack is empty when you want to start cutting off the tail, you can be certain that there is no missing closing html code in the tail anywhere.

From this point onwards:
If there IS matching html left in the tail, i.e the stack is not empty, you continue cleaning with the "~" as we have done so far, until the stack IS empty.

Now that the stack is empty, you can cut off the rest of the tail. And possibly clean up any "~" that were inserted up to this point.

Code on how to use this type of stack, and subroutines for it, are already part of the includes/bbcode.php file. Further study of this file may help. The principle is simple, the stack is an array. A new opening html code is added to the array as a new last element. When a closing code is found, the last element in the array should be the matching opening code. Remove it (unset). When the array count is zero, the stack is empty, and you're definitely not inside <html ... </html> of any sort.

Also, you can no longer rely on just any search result, you will have to make a selection of carefully created source posts (for example in a hidden forum) that will test all the possible options of cutting a post after or within various types of nested quotes and code blocks, so you can see if your code works properly.
asinshesq
Registered User
Posts: 6266
Joined: Sun Feb 22, 2004 9:34 pm
Location: NYC
Name: Alan

Post by asinshesq »

Merlin Sythove wrote: ...They are in fact as you can see from the bbcode.tpl file tables in their own right. The problem is that they may be nested, and that the final </table> that closes a quote in the end of the $tail, may belong to an opening <table that is in $head. Removing that closing </table> will get you into trouble with the screen layout....

I gathered that early on, but I thought I would be ok since I set up regex to look for any open table tag in tail (in the form of '<table' and any other characters) and deleted from that point to the very next '</table>'. I would have thought that if table tags were properly nested I would be cleanly deleting whole tables completely located in the tail by following that methodology. But it didn't work. Similarly, when I was trying the approach of deleting tds or trs, I was doing it by looking for any open tr (or td tag) (in the form of <tr (or <td) ) in the tail and then deleting through to the very next </tr> (or </td>). Again I thought that would result in a clean deletion of entire cells wholly located in the tail, but agian it didn't exactly work. NOt sure why, but after staring at the source code for the real html page for a while I decided the table layout is very complex and I didn't feel like figuring out where it went wrong. So I gave up on delting the table or cell entries.
Merlin Sythove
Registered User
Posts: 2339
Joined: Tue Mar 16, 2004 7:42 am

Re: MOD Search_partial_display

Post by Merlin Sythove »

Mod is updated, new mod in first post.
Need custom work done? Pimp My Forum!
Post Reply

Return to “[2.0.x] MOD Writers Discussion”