TLDNR: On a busy site (more than 5 million page requests per day), the database server gets overloaded
A HA clustering DB server should be used.
Most MySQL Clustering systems depend on InnoDB tables, but for the actual clustering system to work, all tables must have a primary key.
Otherwise you will get errors like this when trying to start the replication:
[Repl] Plugin group_replication reported: 'Table phpbb_acl_groups does not have any PRIMARY KEY. This is not compatible with Group Replication.'
Last edited by Mick on Tue Mar 05, 2024 9:59 am, edited 1 time in total.
Reason:Solved.
Paul wrote: Sun Mar 03, 2024 10:19 am
If you believe you found a bug the right location would be to post in the bug tracker at http://tracker.phpbb.com/
Thanks.
It has been created during the original discussion:
Starting to think I really need to add the PK myself to the tables lacking it.
Despreately need a read replica for mysql load balancing, but without PK in all tables will just run to more problems.
Quoting AI:
In standard MySQL replication, while it is not strictly required for tables to have a primary key for replication to function, it is highly recommended. Here’s why:
Importance of Primary Keys in Replication
Performance:
Update/Delete Operations: Without a primary key or a unique key, MySQL must use a full table scan to locate the rows that need to be updated or deleted. This can significantly slow down the replication process and increase the load on the replica server.
Replication Lag: The absence of primary keys can cause higher replication lag because finding rows to update or delete becomes more time-consuming.
Data Consistency:
Ensuring Uniqueness: Primary keys ensure that each row can be uniquely identified, reducing the risk of data anomalies during replication.
Conflict Resolution: In multi-threaded replication, primary keys help in efficiently resolving conflicts and maintaining data integrity.
AI now suggests only adding unique constraint to the tables not having a PK.
nstead of adding an AUTO_INCREMENT PRIMARY KEY, you could only add the unique constraint (UNIQUE KEY (user_id, group_id)). This would ensure data integrity without modifying the core structure too much, thus reducing potential conflicts with future updates.
Any thoughts from esteemed community members on that?
Main motivation here is still to be able to use e.g. InnoDB Cluster, which requires uniqueness of the entries to the tables, in an order to have proper replication.
Turns out a UNIQUE KEY for each table, not having a PK, is enough for replication to work.
This of course requires, that no duplicate rows exist, for key combinations selected for creating the unique key.
But in my installation, following tables seems to have duplicate rows where they should not exist:
phpbb_acl_groups
SELECT group_id, forum_id, auth_option_id, COUNT(*) FROM phpbb_acl_groups GROUP BY group_id, forum_id, auth_option_id HAVING COUNT(*) > 1;
- > several
phpbb_acl_users
SELECT user_id, forum_id, auth_option_id, COUNT(*) FROM phpbb_acl_users GROUP BY user_id, forum_id, auth_option_id HAVING COUNT(*) > 1;
- > several
phpbb_poll_votes
SELECT topic_id, poll_option_id, vote_user_id, COUNT(*) AS duplicate_count FROM phpbb_poll_votes GROUP BY topic_id, poll_option_id, vote_user_id HAVING COUNT(*) > 1;
- > several
phpbb_topics_watch
SELECT topic_id, user_id, COUNT(*) AS duplicate_count FROM phpbb_topics_watch GROUP BY topic_id, user_id HAVING COUNT(*) > 1;
- > several
Rest of the tables without PK, have key combinations that can be used as unique key or they have a unique key.
Could this be some historical bug in the board?
(board has been running for several years)
As for reference, this is the whole list of tables without PK:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'pbpBBdb'
AND TABLE_NAME NOT IN (
SELECT DISTINCT TABLE_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = 'pbpBBdb'
AND CONSTRAINT_NAME = 'PRIMARY'
);