Fixing post_text BBCode via DB

Get help with installation and running phpBB 3.3.x here. Please do not post bug reports, feature requests, or extension related questions here.
Senshi_x
Registered User
Posts: 13
Joined: Mon Nov 18, 2024 9:13 am

Fixing post_text BBCode via DB

Post by Senshi_x »

Support Request Template
What version of phpBB are you using? phpBB 3.3.13
What is your board's URL? not publicly accessible right now
Who do you host your board with? on "bare metal" in data center
How did you install your board? I used the download package from phpBB.com
What is the most recent action performed on your board? Fresh Install
Is registration required to reproduce this issue? No
Do you have any MODs installed? No
Do you have any extensions installed? Yes
What extensions do you have installed? StopForumSpam
What styles do you currently have installed? Carbon
What language(s) is your board currently using? default BE
Which database type/version are you using? MariaDB
What is your level of experience? New to phpBB but not PHP
What actions did you take (updating your board; installing a MOD, style or extension; etc.) prior to this problem becoming noticeable? Migrated from vb3, then manually set/fixed permissions for groups so everything is nice and clean again.
Please describe your problem. After converting an old forum with 2 million posts, some BBCode did not get converted properly. I'll give an example.

This is the post_text. The problem is that the quote doesn't show as a quotebox in the forum. The cause is simply the double quote around the name. When manually editing the post via the frontend and removing the doubled quotes around the author name, the preview shows it would "parse" properly into a quotebox.

Code: Select all

<t>[quote=""Spaitro""]Thank you so much!<br/>
<br/>
Everything worked flawlessly once I installed the files in the game folder, and now it’s running perfectly. I’m incredibly grateful for all the support and solutions you provided—it made all the difference. Thanks again![/quote]<br/>
<br/>
Thank you for reporting back. All the best and have fun!</t>
As this issue affects a couple hundred thousand posts, we don't feel like doing this manually, though. Fixing the quotes in the DB is easily done with a simple regexp_replace (and takes just over a minute), but that sadly does not translate to the posts being rendered properly in the forum. Purging cache did not help either.
I found that you can trigger a reparse via the CLI (https://www.phpbb.com/support/docs/en/3 ... -reparser/), but that also doesn't help. Assuming this is because the text already is considered parsed, I then tried to first unparse the post_text in the DB, leading to this value, which looks like "clean" bbcode text to me:

Code: Select all

[quote="Spaitro"]Thank you so much!

Everything worked flawlessly once I installed the files in the game folder, and now it’s running perfectly. I’m incredibly grateful for all the support and solutions you provided—it made all the difference. Thanks again![/quote]

Thank you for reporting back. All the best and have fun!
I then ran the reparse CLI again for this postid, but it unexpectedly resulted in the exact same result as the first codeblock above (minus the extra double quotes, obviously), so still no luck.

So now I tried to just hit "preview"&"save" in the forum frontend to see what the post_text would look like then:

Code: Select all

<r><QUOTE author="Spaitro"><s>[quote="Spaitro"]</s>Thank you so much!<br/>
<br/>
Everything worked flawlessly once I installed the files in the game folder, and now it’s running perfectly. I’m incredibly grateful for all the support and solutions you provided—it made all the difference. Thanks again!<e>[/quote]</e></QUOTE>

Thank you for reporting back. All the best and have fun!</r>
Which is obviously quite a bit different result, and I'm unsure why there's such a discrepancy.

Now the question: Is there an easy way to get the desired result (bottom codeblock) after manipulating post_text via the DB? Or do I need to dig into the phpBB functions that generate/prepare the text and call that for every single DB row in PHP (which obviously will have atrocious performance)?
rxu
Extensions Development Team
Posts: 3982
Joined: Wed Oct 25, 2006 12:46 pm
Location: Siberia, Russian Federation

Re: Fixing post_text BBCode via DB

Post by rxu »

Senshi_x wrote: Fri Nov 22, 2024 3:28 pm that also doesn't help. Assuming this is because the text already is considered parsed
It shouldn't, actually reparser takes the text from the database, unparses it and then parses again.
But reparser takes several flags in account such as f.e. enable_bbcode, so if some posts have "Disable bbcode" flag set (hence enable_bbcode is 0 in the relevant phpbb_posts table record), BBCode won't be parsed in the post text.
User avatar
thecoalman
Community Team Member
Community Team Member
Posts: 6516
Joined: Wed Dec 22, 2004 3:52 am
Location: Pennsylvania, U.S.A.

Re: Fixing post_text BBCode via DB

Post by thecoalman »

Might not be that helpful but you can identify the plain text posts with the <t>...</t> tags.

There is topic here from another user with same issue without a resolution because it was a handful of posts and they just dealt with them manually. Opening the post for edit and then saving it will fix it but I guess that is not an option in your case.

viewtopic.php?t=2656969

If enable bbcode is set to 1 then you might be in "no where land" because they should never get to the state they are in unless the user specifically posted it like that if it was always a phpBB forum. I tried all kinds of different things on my test board like changing the checksum, adding <r> tags and nothing worked.
rxu wrote: Sat Sep 28, 2024 5:10 am Hence reparser treats this as a plain text with no BBCode (as it can't find phpBB 2.0 or 3.3 parsed BBCodes in the text). As far as the reparser has no options to enforce parsing plain text as BBCode to get them back to the text again, it seems there's no easy way to fix that.
Presumably the reasoning would be is you aren't screwing up posts that were purposely posted as plain text. Assuming that is the case it would appear your best option would be to use regex_replace to insert formatting used to store quotes, make sure enable bbcode is set to yes on those posts and last but not least run the reparser.
“Results! Why, man, I have gotten a lot of results! I have found several thousand things that won’t work.”

Attributed - Thomas Edison
Senshi_x
Registered User
Posts: 13
Joined: Mon Nov 18, 2024 9:13 am

Re: Fixing post_text BBCode via DB

Post by Senshi_x »

Code: Select all

|post_id  |topic_id|forum_id|poster_id|icon_id|poster_ip     |post_time    |post_reported|enable_bbcode|enable_smilies|enable_magic_url|enable_sig|post_username|post_subject                                       |post_text                                                                                                                                                                                                                                                                                                                                                     |post_checksum                   |post_attachment|bbcode_bitfield|bbcode_uid|post_postcount|post_edit_time|post_edit_reason|post_edit_user|post_edit_count|post_edit_locked|post_visibility|post_delete_time|post_delete_reason|post_delete_user|sfs_reported|
|---------|--------|--------|---------|-------|--------------|-------------|-------------|-------------|--------------|----------------|----------|-------------|---------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|---------------|---------------|----------|--------------|--------------|----------------|--------------|---------------|----------------|---------------|----------------|------------------|----------------|------------|
|2,226,246|154,833 |27      |21,503   |0      |162.158.134.15|1,730,217,615|0            |1            |1             |1               |0         |             |Re: Game Freeze/Crash When join and spown in server|<t>[quote="Spaitro"]Thank you so much!<br/>¶<br/>¶Everything worked flawlessly once I installed the files in the game folder, and now it’s running perfectly. I’m incredibly grateful for all the support and solutions you provided—it made all the difference. Thanks again![/quote]<br/>¶<br/>¶Thank you for reporting back. All the best and have fun!</t>|bd91a34c457db70050cfafc316983daa|0              |               |10x504x2  |1             |0             |                |0             |0              |0               |1              |0               |                  |0               |0           |
Thank you very much for the replies. This is the full DB row, so enable_bbcode is clearly enabled for this :( . Would have been an easy win.

The parser was called with

Code: Select all

bin/phpbbcli.php reparser:reparse post_text --ansi --range-min=2226245 --range-max=2226247
The hint regarding the <t> tag was a good idea, sadly we also have some posts with <r> that have equally broken quotes and behave exactly the same otherwise. They also get fixed when preview/saving them via frontend, but not via reparser CLI.

My biggest confusion is why the reparser CLI handles text differently than the frontend "Save/edit" post feature, given that the latter produces working results, whereas the CLI doesn't.


But you at least got me on the right track. Some "beautiful" regex SQL (MariaDB) fixes this singular issue. I'm sharing it because it might be helpful for others, too. Still, be careful, it's not overly smart, I just added a simple negative lookahead to prevent it from updating the same quotes it already targeted. Obviously you need to change it into an update statement as well, but always use SELECT first to check what you are doing. And it will only work properly for posts with a single quote in them.

Code: Select all

SELECT post_id, post_text, 
REGEXP_REPLACE(post_text, 
	'(?s)^<.>(.*)(?<!<s>)\\[quote=(.*?)\\](.*)\\[\\/quote\\].*<\\/.>$',
	'<r>\\1<QUOTE author=\\2><s>\[quote=\\2\]</s>\\3<e>\[\/quote\]<\/e><\/QUOTE>\\4</r>'
) AS replaced 
FROM phpbb.bb_posts WHERE `enable_bbcode` = 1 AND `post_text` LIKE '%[q%' AND (post_id = 2226232 OR post_id = 2226246)

I guess I'll just have to experiment some more to see if I can safely catch all those broken posts without setting other stuff on fire.
rxu
Extensions Development Team
Posts: 3982
Joined: Wed Oct 25, 2006 12:46 pm
Location: Siberia, Russian Federation

Re: Fixing post_text BBCode via DB

Post by rxu »

The problem is that bbcode_uid is not there, so the text is not considered as having BBCode, hence it's not parsed. With uid it'd look in common like [quote:10x504x2]Some quote[/quote:10x504x2].
In phpBB 3.3.14 there's a new force-bbcode-reparsing reparser option which should handle such cases.
Senshi_x
Registered User
Posts: 13
Joined: Mon Nov 18, 2024 9:13 am

Re: Fixing post_text BBCode via DB

Post by Senshi_x »

Oh wow, talk about being lucky that this option just got added. And you knew about, and you replied to tell me. :)

That option is a lifesaver, it "just works".

Still have to fix "malformed" (double double quotes) BBCode in the DB beforehand, but that is a lot easier and safer to do than trying to replicate the final parsed structure with regex alone.

In my case this simple SQL

Code: Select all

UPDATE phpbb.bb_posts SET
post_text = REGEXP_REPLACE(post_text, '\\[quote=""(.*?)""\\]','\[quote="\\1"\]')
was enough and super fast (took only a couple of minutes), followed by the reparse CLI command. Yes, that reparse command is incredibly slow (as expected) and takes ~10 hours on our 2.2 million posts.

There were some other minor BBcode errors that we could fix up directly in the DB, but having this quote issue fixed is a huge improvement of the quality. I love how many knowledgable people are active here ( and willing to share their knowledge). We made the right call to switch to phpBB. 8-)
rxu
Extensions Development Team
Posts: 3982
Joined: Wed Oct 25, 2006 12:46 pm
Location: Siberia, Russian Federation

Re: Fixing post_text BBCode via DB

Post by rxu »

Good that option helped, although it'd be possibly much faster to reparse just selected portions of post ids having those quote BBCodes (using --range-min / --range-max options).
Senshi_x
Registered User
Posts: 13
Joined: Mon Nov 18, 2024 9:13 am

Re: Fixing post_text BBCode via DB

Post by Senshi_x »

As the affected posts were spread out all over history of the forum, it'd have been more work doing it that way. Would have needed to find/select all possibly affected posts, then loop over them in a script to trigger the parser. Certainly easy to do, but still work.
No harm in letting the server work a bit overnight.

I did use the range options when testing the fixes beforehand on individual posts, though.

Return to “[3.3.x] Support Forum”