BBcode bitfield and other serialized data

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Suggested Hosts
User avatar
Toxyy
Registered User
Posts: 594
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

BBcode bitfield and other serialized data

Post by Toxyy » Sun Nov 18, 2018 8:35 pm

(yes I know bbcode bitfield is base64 not serialized)

I've been talking with a friend and he said he doesn't like that phpbb uses zeroth normalized form for the db. I'm still learning and I've never heard of that term before, but asking him more about it, he mentioned storing serialized data, which got me to thinking about design decisions that were made before I got into phpbb at all.

After looking into bbcode bitfield to find quotes, I got to thinking. Why is it stored as a base64 varchar(255) and not as a varbinary or tinyblob? It's used as a bitfield, why not use types made for that?

After trying to manually edit notifications before realizing my event was enough, is there any reason to have notification_data be be stored serialized?

I suppose I can understand why forum parents are, only somewhat.

I'd like to put him in his place if I can, and to understand the design decisions better. I'm mostly curious if the bitfield can be stored as varbinary/tinyblob instead of base64. Naively (not sure if it works like this), if it's to save on db space, at most 4 characters per 1 million posts, it would save around 2mb of space.
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][RC] Show User Activity

User avatar
kasimi
Extension Customisations
Extension Customisations
Posts: 3384
Joined: Sat Sep 10, 2011 7:12 pm
Location: Germany
Contact:

Re: BBcode bitfield and other serialized data

Post by kasimi » Sun Nov 18, 2018 9:01 pm

Although this doesn't answer the core of your question, the BBCode bitfield is obsolete and not used in phpBB 3.2.x anymore, except for rendering content generated in 3.1.x and earlier.

User avatar
david63
Jr. Extension Validator
Posts: 15070
Joined: Thu Dec 19, 2002 8:08 am
Location: Lancashire, UK
Name: David Wood
Contact:

Re: BBcode bitfield and other serialized data

Post by david63 » Sun Nov 18, 2018 10:18 pm

Toxyy wrote:
Sun Nov 18, 2018 8:35 pm
I'm mostly curious if the bitfield can be stored as varbinary/tinyblob instead of base64. Naively (not sure if it works like this), if it's to save on db space, at most 4 characters per 1 million posts, it would save around 2mb of space
Without checking it may be that those two field types are not available in all database types that phpBB supports.
David
Remember: You only know what you know and - you don't know what you don't know!
My CDB Contributions | How to install an extension
I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic.
No support requests via PM or email as they will be ignored

User avatar
Toxyy
Registered User
Posts: 594
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

Re: BBcode bitfield and other serialized data

Post by Toxyy » Mon Nov 19, 2018 7:44 pm

kasimi wrote:
Sun Nov 18, 2018 9:01 pm
Although this doesn't answer the core of your question, the BBCode bitfield is obsolete and not used in phpBB 3.2.x anymore, except for rendering content generated in 3.1.x and earlier.
Ohh that's good to know. Was it replaced by bbcode uid?
david63 wrote:
Sun Nov 18, 2018 10:18 pm
Without checking it may be that those two field types are not available in all database types that phpBB supports.
SQLite has blob, but mariadb/sql server/mysql(i) have varbinary and blob. Not sure about sphinx, but that's just for searching right?
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][RC] Show User Activity

User avatar
david63
Jr. Extension Validator
Posts: 15070
Joined: Thu Dec 19, 2002 8:08 am
Location: Lancashire, UK
Name: David Wood
Contact:

Re: BBcode bitfield and other serialized data

Post by david63 » Mon Nov 19, 2018 9:31 pm

Toxyy wrote:
Mon Nov 19, 2018 7:44 pm
SQLite has blob, but mariadb/sql server/mysql(i) have varbinary and blob.
I don't think that Postgres or Oracle support those, but I could be wrong.
David
Remember: You only know what you know and - you don't know what you don't know!
My CDB Contributions | How to install an extension
I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic.
No support requests via PM or email as they will be ignored

User avatar
Toxyy
Registered User
Posts: 594
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

Re: BBcode bitfield and other serialized data

Post by Toxyy » Mon Nov 19, 2018 10:01 pm

david63 wrote:
Mon Nov 19, 2018 9:31 pm
I don't think that Postgres or Oracle support those, but I could be wrong.
Oh I thought I included postgres.

http://www.sqlines.com/sql-server-to-postgresql

Looks like no varbinary, but BYTEA.

http://www.dpriver.com/blog/2010/07/28/ ... ql-server/

And oracle with blobs, or raw.

So they all have blob, except for postgres which has BYTEA, which should mostly match.

Are there any more bitfields stored in the db? I haven't seen any.
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][RC] Show User Activity

User avatar
3Di
Registered User
Posts: 13140
Joined: Mon Apr 04, 2005 11:09 pm
Location: Milan (IT) Frankfurt (DE)
Name: Marco
Contact:

Re: BBcode bitfield and other serialized data

Post by 3Di » Tue Nov 20, 2018 6:30 am

Toxyy wrote:
Mon Nov 19, 2018 7:44 pm
Was it replaced by bbcode uid?
https://wiki.phpbb.com/Parsing_text
https://wiki.phpbb.com/Generate_text_for_display
https://wiki.phpbb.com/Generate_text_for_storage
https://www.phpbb.com/support/docs/en/3 ... d-bbcodes/

Toxyy wrote:
Mon Nov 19, 2018 7:44 pm
Not sure about sphinx, but that's just for searching right?
:?: http://sphinxsearch.com/docs/
:game_die: The new Dice Roller extension for phpBB 3.2 is out! :game_die:

Please PM me only to request paid works. Thx.
Want to compensate me for my interest? Donate
Extensions, Scripts, MOD porting, Update/Upgrades
My development's activity º PhpStorm's proud user

User avatar
AmigoJack
Registered User
Posts: 5390
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: BBcode bitfield and other serialized data

Post by AmigoJack » Tue Nov 20, 2018 7:56 am

Guys, don't mix anything with everything:
  • Binary data type columns would be most appropriate, yes.
  • You have to look up if the DBMSes supported what you demand at the time phpBB3.0.0 was developed (not even released), which was around 2007. It's irrelevant what today's versions of each DBMS supports.
  • A UID is not a bitfield - it is a unique identifier, which helps identifying BBCodes in a text. It doesn't store any information.
  • A bitfield is not a UID - it is a line of bits (one bit = yes/no). One byte can have 8 bit. phpBB3's limit of BBCodes was 1151, which means in theory the longest bitfield would consume 144 byte.
  • Storing raw binary in a database must always be considered a feature, not something that was planned to begin with. Encoding binary data into Base64 means the outcome is ASCII safe and thus can easily be stored in text columns.
  • Sphinx is no DBMS to begin with.
  • ASCII is 7bit, not 8. Everything beyond the first 128 bytes are undefined in terms of ASCII - that's why numerous text encodings evolved over the years.
Does it now make sense?
Last edited by AmigoJack on Tue Nov 20, 2018 10:28 am, edited 1 time in total.
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

User avatar
Toxyy
Registered User
Posts: 594
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

Re: BBcode bitfield and other serialized data

Post by Toxyy » Tue Nov 20, 2018 9:18 am

Thanks for the docs 3Di! I found decode_message from those

By replace bitfield, I meant is the info previously recorded by the bitfield being referenced by whatever the uid is pointing to. Which seems so as its used in regex I haven't tried to understand in functions_content.php's decode_message.

I didn't consider that blob or varbinary or bytea weren't supported in all of the dbms's in 2007 :oops:

Though, how is storing it as binary a feature? I now understand that it wasn't considered over a decade ago, though I'm not aware of what you mean by ascii safe. If you mean storing values larger than 8 bits (which is what Google tells me ascii can hold) then I'd agree, but the bitfield is also 8 bits. Or is it actually 7 bits that ascii can hold?

I never heard about sphinx until I met an event related to it, so that makes sense with what you've shown me.

Much more clear though, thanks AmigoJack

On another note, as in the op, I sort of understand storing forum parents as serialized data to generate the navbits. But why is notification data serialized? It seems to be regenerated even on post edits which I discovered the hard way with the help of rxu, and it's pulled regularly like topic or post data in forums and topics. My bad guess is that it saves db space, and it's not like anyone edits notification data (except me now lol).

As for 0th normal form, I didn't even know about that until the other day and as such I can't even guess what form phpbb uses...
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][RC] Show User Activity

User avatar
AmigoJack
Registered User
Posts: 5390
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: BBcode bitfield and other serialized data

Post by AmigoJack » Tue Nov 20, 2018 10:42 am

Toxyy wrote:
Tue Nov 20, 2018 9:18 am
If you mean storing values larger than 8 bits (which is what Google tells me ascii can hold) then I'd agree, but the bitfield is also 8 bits. Or is it actually 7 bits that ascii can hold?
Don't ask Google what a dictionary can tell you: I edited my post to also add the fact that ASCII is 7bit. A DBMS in its main purpose stores information, such as text and values, with the intent to randomly access and quickly sort/find it. It was not invented with the purpose of storing, sorting and finding raw data, and up to today this still is only a minor case, never the main case (i.e. storing attachments still requires an ID, the filename, the time... and only for one column the raw payload).
Toxyy wrote:
Tue Nov 20, 2018 9:18 am
why is notification data serialized
I'm not up to date with that behaviour, but assume it's for the same case as for storing forum parents: the performance is much better doing it this way than forcing it to the 3NF - there are cases when you do this, and not having the need to ever search for these fields adds up to such a decision. It's like following the DRY princicple with reasonable exceptions, as in:

Code: Select all

i++;
j++;
...instead of doing:

Code: Select all

void inc( a[] int ) {
  foreach( &aI in a ) aI++;
}

inc( [i, j] );
The worst thing about censorship is ███████████
Affin wrote:
Tue Nov 20, 2018 9:51 am
The problem is probably not my English but you do not want to understand correctly.
...
We will not come anybody anyway, nevertheless, it's best to shit this.

User avatar
Ger
Recognised Extension Developer
Posts: 1792
Joined: Wed Jan 02, 2008 7:35 pm
Location: 192.168.1.100
Contact:

Re: BBcode bitfield and other serialized data

Post by Ger » Tue Nov 20, 2018 11:22 am

Toxyy wrote:
Tue Nov 20, 2018 9:18 am
But why is notification data serialized?
Fun fact: phpBB extensions are not allowed to use that: viewtopic.php?f=461&t=2411901
My extensions:
Simple CMS, Feed post bot, Avatar Resize, Modbreak, Magic OGP, Live topic update, Modern Quote, Quoted Where (GDPR) and Autoresponder.
Newest: FAQ manager for 3.2

Like my work? Buy me a coffee to keep it coming. :ugeek:
-Available for custom work-

User avatar
JoshyPHP
Code Contributor
Posts: 972
Joined: Mon Jul 11, 2011 12:28 am

Re: BBcode bitfield and other serialized data

Post by JoshyPHP » Tue Nov 20, 2018 7:24 pm

Toxyy wrote:
Sun Nov 18, 2018 8:35 pm
is there any reason to have notification_data be be stored serialized?
What's the alternative? There are ~18 types of notifications in a default installation. Each type of notification needs a different set of data, e.g. user_id of the user who triggered the notification, post_id, topic_id, etc... If you don't want to store any data in serialized form then you'll probably need around ~18 tables to store them.
I wrote the thing that does BBCodes in 3.2.

User avatar
kasimi
Extension Customisations
Extension Customisations
Posts: 3384
Joined: Sat Sep 10, 2011 7:12 pm
Location: Germany
Contact:

Re: BBcode bitfield and other serialized data

Post by kasimi » Wed Nov 21, 2018 10:39 am

The UID has been in the code since the very first SVN commit in February 2001. The bitfield was introduced in April 2003.
Toxyy wrote:
Tue Nov 20, 2018 9:18 am
By replace bitfield, I meant is the info previously recorded by the bitfield being referenced by whatever the uid is pointing to. Which seems so as its used in regex I haven't tried to understand in functions_content.php's decode_message.
While they both serve the same purpose which is performance improvement, bitfield and UID are not related in any way. The bitfield tells you which BBCodes are present in the message, it's used to skip unnecessary calls to str_replace() and preg_replace() when replacing BBCodes with HTML. The UID is used for identifying valid BBCode tags in the message: [code]hello[code][/code] is stored as [code:uid]hello[code][/code:uid] which is easier to compile to HTML than the original form which would have to be parsed again and again on each display.

Some more details here maybe: viewtopic.php?f=461&t=2394696&p=14566771
Ger wrote:
Tue Nov 20, 2018 11:22 am
Fun fact: phpBB extensions are not allowed to use that: viewtopic.php?f=461&t=2411901
That's not entirely true. phpBB extensions are not allowed to use un|serialize() with user controlled data. Extensions are free to serialize any type of data they want using json_en|decode(). But you already knew that. :P

User avatar
Ger
Recognised Extension Developer
Posts: 1792
Joined: Wed Jan 02, 2008 7:35 pm
Location: 192.168.1.100
Contact:

Re: BBcode bitfield and other serialized data

Post by Ger » Wed Nov 21, 2018 2:50 pm

kasimi wrote:
Wed Nov 21, 2018 10:39 am
That's not entirely true. phpBB extensions are not allowed to use un|serialize() with user controlled data. Extensions are free to serialize any type of data they want using json_en|decode(). But you already knew that. :P
Well actually the policy is still not very clear to me, which is why I simply don't use it. Since the discussion proved pointless (team just standing ground and not really open to debate) I chose not to put any energy into it.

For instance: you yourself pointed out that 1 example I referred to was ancient and therefore somehow no security issue. That reasoning is of course flawed. Also: notifications were completely rewritten in 3.2 AFAIK and not even available in 3.0. So why is that serialised instead of JSON encoded? Same goes for the reparser.

Don't get me wrong: I'm perfectly happy using JSON, my main point of frustration has been discussed in the other thread. But I hope you understand the inconsistencies.
My extensions:
Simple CMS, Feed post bot, Avatar Resize, Modbreak, Magic OGP, Live topic update, Modern Quote, Quoted Where (GDPR) and Autoresponder.
Newest: FAQ manager for 3.2

Like my work? Buy me a coffee to keep it coming. :ugeek:
-Available for custom work-

User avatar
Toxyy
Registered User
Posts: 594
Joined: Mon Oct 24, 2016 3:22 pm
Location: Namek
Contact:

Re: BBcode bitfield and other serialized data

Post by Toxyy » Thu Nov 22, 2018 7:13 am

JoshyPHP wrote:
Tue Nov 20, 2018 7:24 pm
What's the alternative? There are ~18 types of notifications in a default installation. Each type of notification needs a different set of data, e.g. user_id of the user who triggered the notification, post_id, topic_id, etc... If you don't want to store any data in serialized form then you'll probably need around ~18 tables to store them.
I didn't want to answer until I did some research. Here it goes!

I've taken this data from a forum and removed all sensitive string data (notification type id then its corresponding notification_data... skipped ones are for those that I didn't have in the database):

Code: Select all

1
a:4:{s:9:"poster_id";i:509;s:11:"topic_title";s:14:"";s:13:"post_username";s:0:"";s:10:"forum_name";s:19:"";}
2
a:4:{s:9:"poster_id";s:4:"1048";s:11:"topic_title";s:28:"";s:13:"post_username";s:0:"";s:10:"forum_name";s:19:"";}
3
a:6:{s:9:"poster_id";i:86;s:11:"topic_title";s:43:"";s:12:"post_subject";s:47:"";s:13:"post_username";s:0:"";s:8:"forum_id";i:4;s:10:"forum_name";s:15:"";}
4
a:7:{s:9:"poster_id";i:739;s:11:"topic_title";s:29:"";s:12:"post_subject";s:33:"";s:13:"post_username";s:0:"";s:8:"forum_id";i:11;s:10:"forum_name";s:19:"";s:10:"responders";a:3:{i:0;a:2:{s:9:"poster_id";i:261;s:8:"username";s:0:"";}i:1;a:2:{s:9:"poster_id";i:192;s:8:"username";s:0:"";}i:2;a:2:{s:9:"poster_id";i:191;s:8:"username";s:0:"";}}}
5
a:7:{s:9:"poster_id";i:120;s:11:"topic_title";s:17:"";s:12:"post_subject";s:21:"";s:13:"post_username";s:0:"";s:8:"forum_id";i:6;s:10:"forum_name";s:17:"";s:10:"responders";a:1:{i:0;a:2:{s:9:"poster_id";i:668;s:8:"username";s:0:"";}}}
6
a:6:{s:12:"post_subject";s:65:"";s:9:"poster_id";s:4:"";s:11:"topic_title";s:61:"";s:13:"post_username";s:0:"";s:8:"forum_id";s:2:"11";s:10:"forum_name";s:19:"";}
7
a:7:{s:17:"disapprove_reason";s:58:"";s:12:"post_subject";s:87:"";s:9:"poster_id";s:4:"";s:11:"topic_title";s:41:"";s:13:"post_username";s:0:"";s:8:"forum_id";s:2:"11";s:10:"forum_name";s:19:"";}
8
9
10
11
12
a:7:{s:17:"disapprove_reason";s:58:"";s:12:"post_subject";s:87:"";s:9:"poster_id";s:4:"";s:11:"topic_title";s:41:"";s:13:"post_username";s:0:"";s:8:"forum_id";s:2:"11";s:10:"forum_name";s:19:"";}
13
a:5:{s:17:"disapprove_reason";s:58:"";s:9:"poster_id";s:3:"";s:11:"topic_title";s:20:"";s:13:"post_username";s:0:"";s:10:"forum_name";s:19:"";}
14
15
a:2:{s:12:"from_user_id";s:3:"";s:15:"message_subject";s:27:"";}
16
17
a:3:{s:9:"closer_id";s:2:"";s:12:"from_user_id";s:3:"";s:15:"message_subject";s:13:"";}
18
a:7:{s:9:"closer_id";s:2:"";s:9:"poster_id";s:3:"";s:11:"topic_title";s:41:"";s:12:"post_subject";s:45:"";s:13:"post_username";s:0:"";s:8:"forum_id";s:2:"";s:10:"forum_name";s:19:"";}
19
20
a:6:{s:9:"poster_id";i:49;s:11:"topic_title";s:13:"";s:12:"post_subject";s:17:"";s:13:"post_username";s:0:"";s:8:"forum_id";i:11;s:10:"forum_name";s:15:"";}
21
22
23
a:5:{s:12:"requester_id";i:2;s:7:"user_id";i:832;s:7:"post_id";i:25490;s:8:"topic_id";i:478;s:12:"post_subject";s:25:"";}
30
a:6:{s:9:"poster_id";i:256;s:11:"topic_title";s:67:"";s:12:"post_subject";s:71:"";s:13:"post_username";s:0:"";s:8:"forum_id";i:11;s:10:"forum_name";s:19:"";}
31
a:4:{s:9:"poster_id";i:974;s:11:"topic_title";s:20:"";s:13:"post_username";s:0:"";s:10:"forum_name";s:19:"";}
Unfortunately I was not able to get all notification IDs and I'm unaware of any internal code that would easily show me these without some serious searching.

It would appear to me that most of the fields are shared by the majority of these types:
poster_id, topic_title, post_username, forum_name, post_subject, and forum_id

Semi unique ones (shared by 3 in that list):
disapprove_reason

shared by 2 in the list:
message_subject, closer_id, from_user_id

completely unique: (postlove extension)
requester_id, user_id, post_id, topic_id

ID 30/31 are for senky's akismet extension.

The only other data there is for responders, which holds more nested serialized data, which has:
username

So no, I wouldn't say this would require 18 tables, just 11 additional columns to the notifications table, which has 8 columns now. Compared to the posts table this is a bit smaller. One of those columns would be for responders which would still hold notification data, unless user ids could be stored in some sort of string format with a delimiter, though I'm not aware of the best practices for doing so in that case. Perhaps keeping that as serialized would be best, or json encoding, I don't know.
kasimi wrote:
Wed Nov 21, 2018 10:39 am
The UID has been in the code since the very first SVN commit in February 2001. The bitfield was introduced in April 2003.

While they both serve the same purpose which is performance improvement...
...Some more details here maybe: viewtopic.php?f=461&t=2394696&p=14566771
WOW! That's an old commit. Thanks kasimi for the insight :)
I am a web developer/administrator, specializing in forums. If you have work you need done or are too lazy to do, pm me!

My extensions:
[3.2][BETA] Anonymous Posts || [3.2][RC] Show User Activity

Post Reply

Return to “phpBB Discussion”

Who is online

Users browsing this forum: JimA and 22 guests