bbcode_uid: what is it good for

This is an archive of the phpBB 2.0.x support forum. Support for phpBB2 has now ended.
Forum rules
Following phpBB2's EoL, this forum is now archived for reference purposes only.
Please see the following announcement for more information: viewtopic.php?f=14&t=1385785
josh
I am too lazy to register

bbcode_uid: what is it good for

Post by josh »

I've been sorting through most of phpBB2 and came accross this. I have a basic idea of what this particular field is doing, but not really. Anybody that has a good idea of what this truly is for, let me know. Thanks.

theFinn
Founder and ex-Contributor
Posts: 1767
Joined: Tue Jul 03, 2001 7:58 pm
Location: Edmonton, AB, Canada
Contact:

Post by theFinn »

nathan can explain it better but when BBcode is used in a post befor it's put into the database a 'first pass' is run on it. Its not converted to HTML, but it does get some identifying information added to it, the bbcode UID.

For example [quote] could become [quote:8ba223]. This is used for working with nested quote and code tags to make it easier/quicker to parse at display time.
James 'theFinn' Atkinson
Founder & ex-Contributor
http://www.thefinn.net

User avatar
GrimmReeper
Registered User
Posts: 70
Joined: Wed Aug 29, 2001 8:49 am
Contact:

Post by GrimmReeper »

ahhh...I was wondering what it was...so Nathan how bout a more detailed explanation?

josh
I am too lazy to register

Post by josh »

Yea, that's basically what I had come up with. Thanks.

nathan
Former Team Member
Posts: 126
Joined: Mon Aug 06, 2001 7:58 pm
Location: Victoria, BC, Canada

Post by nathan »

You want detail? Ok, you asked for it. :) This is what I posted waaaay back when we first cam up with this stuff:

-----------

Here's v2.0 bbcode in a nutshell:

The problem is that parsing the nestable tags (code, quote, and list) is necessarily slow. There's no way around that, other than not actually bothering to find matching pairs of tags. But if we don;t find matching pairs of tags, users can thrash the layout of an entire page just by hanving an unmatched opening tag in their message.

So, what I did was break down that task into 2 separate steps:
1) Recognizing the tags.
2) Actually replacing the tags with HTML code.

Step 1 is done before we put the post in the database, and step 2 is done at pageview time.

What step 1 does is to replace a matching tag pair with a pair of the same tags, but with the addition of a UID string. Example: [quote] ..... [/quote] becomes [quote:UID] ..... [/quote:UID]. The UID part is currently a 10-character random alphanumeric string. Its length can be tweaked for performance, and I haven't decided what the final value will be yet.

So, the second pass can just blindly run a bunch of str_replace() calls, looking for "[quote:UID]" to replace with the starting quote HTML, and "[/quote:UID]" to replace with the ending quote HTML. This process is quite fast.

Also, the first pass can be very easily reversed for the editpost page - we just run str_replace() and basically strip the UIDs from the tags.

So, now the HTML that's used to generate BBCode is no longer stored in the database. This means that HTML can be changed, or specific tags disabled, at any time by the admin without breaking the reversal process, and with the effects taking place immediately across the whole board, including existing posts.

No, it's not possible to match the nested tags with a regular expression, that's the whole heart of the problem. I'll try and explain why:

Essentially, we want to match something like this, right:
#[tag](.*)[/tag]#

Now, there are two ways the quantifier in the middle there can match - either greedy or non-greedy. Look at the following example code block, where I;ll indicate the tags each method will match:
[tag] -- greedy start, non-greedy start
[tag]

[/tag] -- non-greedy end
[/tag] -- greedy end

So, in this case, the non-greedy match fails - it doesn't do the right thing at all. But, the greedy match works fine. Repeated applications of that regexp will do the job.

So, consider our next example. I'll do the same thing as before:
[tag] -- greedy start, non-greedy start

[/tag] -- non-greedy end
[tag]

[/tag] -- greedy end

In this case, the exact opposite is true. The greedy tag fails completely, and the non-greedy tag does the right thing. So, since these two cases could easily occur inside the same message, we can see that there's no way to accomplish this with a regular expression.
Nathan Codding
old-school phpBB developer
Image

josh
I am too lazy to register

Post by josh »

Thanks nathan, that helped out a lot. But, with the regular expressions, I'm sure you've come up with all the ideas, but how's this.

Seeing as you're already putting this uid into the tag before you serialize, why not throw in a lexical address type thing in their too. Then your regular expression would look for that. There would be a slight decrease in performance during the post, but viewing (the thing that happens much more often) would be sped up (through the use of regular expressions). Where the syntax would be [tag:index:level:uid], where index is the index of that tag on that level, and level is the number of levels deep that tag is currently. One example follows

Code: Select all

[tag:0:0:uid]
[tag:0:1:uid]

[/tag:0:1:uid]
[/tag:0:0:uid]

[tag:1:0:uid]
[tag:1:1:uid]

[/tag:1:1:uid]
[/tag:1:0:uid]
then instead of using a generic regular expression like you had:

Code: Select all

$text = preg_replace("#[tag](.*)[/tag]#","<TAG>\\1</TAG>",$text);
you could use a slightly fancier one, but would work (I think):

Code: Select all

$text = preg_replace("#[tag:([0-9]):([0-9]):uid](.*)[/tag:\\1:\\2:uid]#","<TAG>\\3</TAG>",$text);
I don't have a chance to test this right now, but I'm sure you get the idea. I think that'll work, but the performance gain/loss is still up in the air. The method you're using works very well, but I figured I'd let you hear mine.

hsim
Registered User
Posts: 1554
Joined: Tue Oct 23, 2001 9:39 pm
Contact:

Post by hsim »

why would this speed up things? strtr/str_replace is imho even faster than posix regexps.
email me: hsim at gmx.li

josh
I am too lazy to register

Post by josh »

Now that I've posted all that, I finally *got* how it all worked. Nevermind.

//edit: heh, hsim, you beat me to it :)

nathan
Former Team Member
Posts: 126
Joined: Mon Aug 06, 2001 7:58 pm
Location: Victoria, BC, Canada

Post by nathan »

josh: I'm afraid I don't see what your point is. How exactly would that makes things either easier or faster?

Anyway, it's mostly a moot point because the templated bbcode made things more complex in the second-pass, requiring regexp replacements for everything.
Nathan Codding
old-school phpBB developer
Image

vHiker
Registered User
Posts: 333
Joined: Thu Feb 14, 2002 9:59 pm

Post by vHiker »

OK, sorry to dredge up an old topic here but I'm still stumped as to the need for this field. It seems as if one of the things going on here is to check to make sure each bbcode tag is closed when the user posts. However, I'm not sure why the bbcode_uid is necessary for that - it seems trivial to ensure there is a close tag each open tag (and not let the user post if there isn't). I don't see how inserting a bbcode_uid field really helps speed things up at display time. Aren't you going to have to do a search and replace either way? It seems like it would be just as easy to replace [QU0TE] as it would be to replace [QU0TE:bbcode_uid]. No flaming please :wink:. I think I'm just missing something. There obviously needs to be a flag to indicate whether or not bbcode is disabled, but I'm at a loss to understand why the bbcode in the post itself needs surgery.

dreamr_3
Registered User
Posts: 36
Joined: Wed May 15, 2002 1:42 pm
Location: Indiana, USA
Contact:

Post by dreamr_3 »

Wonder if this all wouldn't be easier if BBcode was replaced with support for XML messages... and them a decent XML object model was used to navigate and build the HTML code. Not saying this would be faster... just easier maybe. :)
Josh "Dreamer" Goebel
Director of Technology
Fires Edge Christian Camping
web: http://www.firesedge.org
e-mail: [email]dreamer@remove_spam.firesedge.org[/email]

jonnylamb
Registered User
Posts: 26
Joined: Sat Apr 03, 2004 5:49 pm

Post by jonnylamb »

Sorry if I sound stupid but how does the bbcode_uid get set to the post. I looked at my phpbb_posts_text table in phpMyAdmin and found:

Code: Select all

[b:4e2360dc04]sdf[/b:4e2360dc04]
for the text. I was just wondering how the tags are recognised as a pair and this bbcode_uid was set. I see why you do this and agree it is a very good idea. I have scanned the posting.php file and the bbencode_first_pass_pda function in bbcode.php but still don't understand how this works. If you could explain the creation of these tags i would much appreciate it.

Thanks

Jonny Lamb

Kinfule
Registered User
Posts: 706
Joined: Tue Mar 02, 2004 12:16 am
Location: Chile

Post by Kinfule »

This should be in the KB

User avatar
Hater
Registered User
Posts: 570
Joined: Tue May 06, 2003 8:56 pm
Location: Wisconsin
Contact:

Post by Hater »

Okay..

The easiest way that I can explain this, is that it's easier for the parser to look through the post and replace the bbcode_uid, than search for a multitude of different types of bbcodes.

There are 11 BBCodes installed in vanilla phpBB, and countless other additions. With some codes being able to contain other codes, the potential to completely trash the display is inevitable.
  • Without the bbcode_uid, you would loop through the contents of a post_text at least 11 times, 1 for each active BBCode.
  • With bbcode_uid, you only loop through once and look for the bbcode_uid, then check the text immediately before it, and sort accordingly.
It works like a LRN, a local routing number. Companies like Sprint, T-Mobile, AT&T use LRNs to "own" other thousands of numbers, so if they need to make a change to a range of numbers, they only need to do it to one.

Think of it like subsidizing your search, minimizing effort to maximize output through the use of a tag that is universal to that post. :)

User avatar
Marshalrusty
Project Manager
Project Manager
Posts: 29253
Joined: Mon Nov 22, 2004 10:45 pm
Location: New York City
Name: Yuriy Rusko
Contact:

Post by Marshalrusty »

Kinfule wrote: This should be in the KB

Wow, you brought back a year old (started 4 years ago, mind) topic, to say that? Nice
Have comments/praise/complaints/suggestions? Please feel free to PM me.

Need private help? Hire me for all your phpBB and web development needs

Locked

Return to “2.0.x Support Forum”