How to handle this query in a better way??

Discussion forum for Extension Writers regarding Extension Development.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

How to handle this query in a better way??

Post by JLA »

Starting with a table that contains over 700 million rows (INNODB MYSQL)

Structure of TESTTABLE1
Item_number int(10) auto_increment
field1 smallint(8)
field2 varchar(250)
field3 mediumint(8)
field4 int(11)
field5 mediumint(8)
field6 mediumint(8)
field7 tinyint(3)
field8 tinyint(1)
field9 tinyint(3)
field10 int(10)
field11 varchar(25)
field12 int(10)
field13 mediumint(8)
field14 varchar(25)
field15 int(11)
field16 int(10)
field17 tinyint(1)

Here are the indexes

**KEYNAME** - **TYPE** - **FIELD/s**
PRIMARY PRIMARY item_number
field1 INDEX field1
field16 INDEX field16
field7 INDEX field7
field9 INDEX field9
field12 INDEX field12
combo1 INDEX field1 and field9 and field12
combo2 INDEX field1 and field13
combo3 INDEX field1 and field4

So working with these two queries
Notes: for the queries below - field1 =1201 contains a total of 22,777,891 rows in this table

SELECT item_number
FROM TESTTABLE1 USE INDEX (combo1)
WHERE field1 = 1201
ORDER BY field9 DESC,field12 DESC
LIMIT 10000050,50

Running an explain on this query we see
**ID** - **SELECT TYPE** - **TABLE** - **TYPE** - **POSSIBLE KEYS** - **KEY** - **KEY_LEN** - **REF** - **ROWS** - **EXTRA**
1, SIMPLE, TESTTABLE1, ref, combo1, combo1, 2, const, 20159010, Using where; Using index

Running this query we are seeing a very long execution time of between 8 and 11 seconds for a return of 50 rows.

Better way to do this to achieve the same result?

After this query we take these results and run the following query
SELECT * FROM TESTTABLE1
WHERE item_number in (our results inserted here)
ORDER BY field9 DESC, field12 DESC;

This gives us the full 50 rows from from our previous query using the primary key index (no issues here)
User avatar
Brf
Support Team Member
Support Team Member
Posts: 52219
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}
Contact:

Re: How to handle this query in a better way??

Post by Brf »

So, if I am reading it correctly... You are reading 22-million rows, sorting them descending, then reading sequentially through 10-million of them before returning 50 rows?
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: How to handle this query in a better way??

Post by JLA »

Brf wrote:
Wed Nov 04, 2020 7:33 pm
So, if I am reading it correctly... You are reading 22-million rows, sorting them descending, then reading sequentially through 10-million of them before returning 50 rows?
Yes, we have a multi-column index (combo1) on column (field1, field9 and field12)

Our 1st select asks for the item_numbers (which is a int(10)) of the rows that have field1 (smallint(8)) = 1201 ordered by field9 (tinyint(3)) DESC then field12 (int(10)) DESC using the index combo1 We know that the 50 rows we want reside about 10 million rows from the end (pretty much in the middle) of the 22 million rows where field1 = 1201. Our sort I assume is fast due to our combo1 index right? So the reading sequentially through 10 million rows to get our 50 rows is where the time cost is at???
User avatar
Brf
Support Team Member
Support Team Member
Posts: 52219
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}
Contact:

Re: How to handle this query in a better way??

Post by Brf »

I don't know if the sort is fast or slow. Try returning the first 50, rather than the middle, and see if there is a difference.
Paul
Infrastructure Team Leader
Infrastructure Team Leader
Posts: 27084
Joined: Sat Dec 04, 2004 3:44 pm
Location: The netherlands.
Name: Paul Sohier
Contact:

Re: How to handle this query in a better way??

Post by Paul »

I suggest you ask this at a place dedicated to mysql, as they probably have way more knowledge on optimization for queries as we have.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: How to handle this query in a better way??

Post by JLA »

Brf wrote:
Wed Nov 04, 2020 8:20 pm
I don't know if the sort is fast or slow. Try returning the first 50, rather than the middle, and see if there is a difference.
Yes, returning the 1st or last 50 is instant. Its only the rows that reside in the middle. The closer you get to middle, the longer the query takes
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: How to handle this query in a better way??

Post by axe70 »

I really like to work with things under my hand (since not so smart) can so you provide an sql structure dump, may saved via phpmyadmin, that would be so possible to load in seconds as test, and i'll have your same 2 or 3 tables/scenario into a database test?

also a dump containing data to load would be nice (maybe 10 or 20 records with data modified if necessary for privacy). I assume so more then one will try to test out the thing as presented, and i would be one of those probably!
User avatar
Brf
Support Team Member
Support Team Member
Posts: 52219
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}
Contact:

Re: How to handle this query in a better way??

Post by Brf »

How is testing 10 or 20 records going to help when the query is reading 10 million rows? We have already identified the cause of the problem.
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: How to handle this query in a better way??

Post by axe70 »

Because i can fill automatically with random values easily, filling maybe 10000 with a stupid script.
And i would like to not loose too much time, just 5 min on doing this.
And because maybe i can think it has not been well answered, since i received a question about this privately?
Maybe, maybe not.

And sorry, i understood that an answer was redirecting to "some mysql site".
That i can understand if you would like to see these kind of "general questions" into another forum.
User avatar
Brf
Support Team Member
Support Team Member
Posts: 52219
Joined: Tue May 10, 2005 7:47 pm
Location: {postrow.POSTER_FROM}
Contact:

Re: How to handle this query in a better way??

Post by Brf »

There is a difference between 10 thousand and 10 million. try reading 10-million rows sequentially.
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: How to handle this query in a better way??

Post by axe70 »

Brf wrote:
Thu Nov 05, 2020 5:34 pm
There is a difference between 10 thousand and 10 million. try reading 10-million rows sequentially.
no thanks :D i prefer MySQL engine doing this!
And check out the time execution, there is a rapport you can consider, isn't it?
User avatar
JoshyPHP
Code Contributor
Posts: 1194
Joined: Mon Jul 11, 2011 12:28 am

Re: How to handle this query in a better way??

Post by JoshyPHP »

JLA wrote:
Wed Nov 04, 2020 7:14 pm
Running this query we are seeing a very long execution time of between 8 and 11 seconds for a return of 50 rows.
EXPLAIN shows a key len of 2 bytes, which is the size of field1. That means it only uses the first part of the index. If it used all three, it would use 2+1+4 = 7 bytes.

Try ANALYZE-ing the table, maybe replace USE INDEX with FORCE INDEX too.

As a last resort, I would experiment with an indexed column solely dedicated to this type of query. Maybe a BIGINT column whose value is equal to field1 * POW(2, 33) + field9 * POW(2, 32) + field12. You'd have to query rows whose values are BETWEEN 1201 * POW(2, 33) AND 1202 * POW(2, 33) - 1. MySQL has not always been very good with BIGINTs and ranged queries though, so I don't know how well that would work.

Ideally you'd replace the pagination so you don't have to order then skip thousands of records but you'd need to know the field9 and field12 values for the previous page and I don't know if it's possible in your case.
I wrote the thing that does BBCodes in 3.2+.
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: How to handle this query in a better way??

Post by JLA »

axe70 wrote:
Thu Nov 05, 2020 5:02 pm
I really like to work with things under my hand (since not so smart) can so you provide an sql structure dump, may saved via phpmyadmin, that would be so possible to load in seconds as test, and i'll have your same 2 or 3 tables/scenario into a database test?

also a dump containing data to load would be nice (maybe 10 or 20 records with data modified if necessary for privacy). I assume so more then one will try to test out the thing as presented, and i would be one of those probably!

Code: Select all

CREATE TABLE `testtable1` (
  `item_number` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `field1` smallint(8) unsigned NOT NULL DEFAULT '0',
  `field2` varchar(250) NOT NULL,
  `field3` mediumint(8) NOT NULL DEFAULT '0',
  `field4` int(11) NOT NULL DEFAULT '0',
  `field5` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `field6` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `field7` tinyint(3) NOT NULL DEFAULT '0',
  `field8` tinyint(1) NOT NULL DEFAULT '0',
  `field9` tinyint(3) NOT NULL DEFAULT '0',
  `field10` int(10) unsigned NOT NULL DEFAULT '0',
  `field11` varchar(25) NOT NULL,
  `field12` int(10) unsigned NOT NULL DEFAULT '0',
  `field13` mediumint(8) NOT NULL DEFAULT '0',
  `field14` varchar(25) NOT NULL,
  `field15` int(11) NOT NULL DEFAULT '0',
  `field16` int(10) unsigned NOT NULL DEFAULT '0',
  `field17` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`item_number`),
  KEY `field1` (`field1`),
  KEY `field16` (`field16`),
  KEY `field7` (`field7`),
  KEY `field9` (`field9`),
  KEY `field12` (`field12`),
  KEY `combo1` (`field1`,`field9`,`field12`),
  KEY `combo2` (`field1`,`field13`),
  KEY `combo3` (`field1`,`field4`)
) ENGINE=InnoDB AUTO_INCREMENT=754588921 DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED$$

Here is a test line of data

Code: Select all

INSERT INTO testtable1 (item_number, field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17) VALUES
(552578116, 1201, 'This is text text for field 2.  This is only a test so we can do a test of field 2.  This has only been a test.', 5567864, 1604609050, 954188, 0, 0, 0, 0, 576694909, 'Test 1234', 5567864, 5567864, 'Test 1234', 1604609050, 0, 1);
User avatar
JLA
Registered User
Posts: 589
Joined: Tue Nov 16, 2004 5:23 pm
Location: USA
Name: JLA FORUMS
Contact:

Re: How to handle this query in a better way??

Post by JLA »

JoshyPHP wrote:
Thu Nov 05, 2020 6:07 pm
JLA wrote:
Wed Nov 04, 2020 7:14 pm
Running this query we are seeing a very long execution time of between 8 and 11 seconds for a return of 50 rows.
EXPLAIN shows a key len of 2 bytes, which is the size of field1. That means it only uses the first part of the index. If it used all three, it would use 2+1+4 = 7 bytes.

Try ANALYZE-ing the table, maybe replace USE INDEX with FORCE INDEX too.

As a last resort, I would experiment with an indexed column solely dedicated to this type of query. Maybe a BIGINT column whose value is equal to field1 * POW(2, 33) + field9 * POW(2, 32) + field12. You'd have to query rows whose values are BETWEEN 1201 * POW(2, 33) AND 1202 * POW(2, 33) - 1. MySQL has not always been very good with BIGINTs and ranged queries though, so I don't know how well that would work.

Ideally you'd replace the pagination so you don't have to order then skip thousands of records but you'd need to know the field9 and field12 values for the previous page and I don't know if it's possible in your case.
Using explain without a force gives this
**ID** - **SELECT TYPE** - **TABLE** - **TYPE** - **POSSIBLE KEYS** - **KEY** - **KEY_LEN** - **REF** - **ROWS** - **EXTRA**
1, SIMPLE, testtable1 ref, field1,combo1,combo2,combo3, combo1, 2, const, 23221940, Using where; Using index

Get the same thing when we include our force in the query I put in the OP

Not possible to know the field9 and field12 values for the previous page as the data is in a constant state of change. We have a caching mechanism in place for the results of this query but it is only valid for 60 seconds due to the constantly changing state of this data.
User avatar
axe70
Registered User
Posts: 279
Joined: Sun Nov 17, 2002 10:55 am
Location: Italy
Contact:

Re: How to handle this query in a better way??

Post by axe70 »

I'm sorry that JLA have not provided a running ready example of tables to play with.
So since i'm not smart enough, and no time to follow, i passed this, for curiosity, to a computer engineer, that i know very well.
If there is an answer to this, and he will find out the time to look into, i will return here with his answer (if there is one he have to say, to improve the thing). May he will register himself and answer, maybe, since i linked this post to him. Who know!?
Last edited by axe70 on Fri Nov 06, 2020 1:29 pm, edited 2 times in total.
Post Reply

Return to “Extension Writers Discussion”