OK, the query cache is a pretty smart idea, right? When the same query is executed many times by many threads, and the data in the relevant tables isn't touched so the query cache is invalidated, then the query cache is a pretty good idea, right? And even when not used, it really shouldn't be in the way, right? It's in memory after all, so a check for a query in the cache should be quick, right?
Maybe, but far from always, and fact is that the query cache is becoming less and less useful. Let me tell you why. We have more and more memory in our machines, and disk is less used for storage, and more for persistence. Even though the size of our databases we process is getting bigger and bigger, the data we actually access isn't growing that much, or in other words, we keep storing more and more historical data, for a number of reasons.
At the same time as the amount of RAM we have access to, we do not get faster CPUs at the same rate. What we do get is MORE CPUs, or rather more cores. In other words, we must do more and more multithreading to support the faster CPUs and allow them to access the RAM we have. My reasoning is this: We have more and more RAM, one of the major uses for RAM is to cache data from disks and in databases. This will speed up access to the data that is on disk enormously. But when we have so much RAM that the data we use is actually most there, in RAM, then what limits performance? And the answer is, the way we access that RAM.
This is what I wanted to test, how does the possibly inefficient multithreading affect performance, once the bottleneck that disk access represents is out of the way? This is my little experiment: 100 millions rows in a single table. This is part of my tests to use MySQL as a key value store, so there are no secondary indexes, no foreign keys and as I want to test this with MySQL Cluster, I have also removed any VARCHAR fields. As it stands, the table has one BIGINT one INT and one DOUBLE column, and the BIGINT is the PRIMARY KEY. This is real data though, but I have removed some fields only.
My first test was with a cold database, just started. Noone else was using it and I had an InnoDB cache of 8 Gb using half of what was in the machine. Then I had a benchmark that reads 1 millions rows from this table, using the BIGINT column for lookup, and the 1 IDs to look up are spread over 30 threads (I have 8 cores in this machine). With a cold database, it took 869 seconds to read all this data. The next round, with data in memory, it was a lot faster, but not as fast as you might think, just about half the time, 432 seconds. Now, during that last round, show status showed a lot of "Waiting for query cache lock", so I decided to try without the query cache? Fast or slower, what do you think. The answer is faster. A LOT faster! some 30 seconds! As compared to 432 seconds! This calls for further investigation, but my initial response to this is: If you use MySQL in a way that your data in in memory, then the query_cache isn't just useless, it is severely limiting performance! But I'll have some more data soon on this, this is getting fun!
/Karlsson![]()
PlanetMySQL Voting: Vote UP / Vote DOWN
Maybe, but far from always, and fact is that the query cache is becoming less and less useful. Let me tell you why. We have more and more memory in our machines, and disk is less used for storage, and more for persistence. Even though the size of our databases we process is getting bigger and bigger, the data we actually access isn't growing that much, or in other words, we keep storing more and more historical data, for a number of reasons.
At the same time as the amount of RAM we have access to, we do not get faster CPUs at the same rate. What we do get is MORE CPUs, or rather more cores. In other words, we must do more and more multithreading to support the faster CPUs and allow them to access the RAM we have. My reasoning is this: We have more and more RAM, one of the major uses for RAM is to cache data from disks and in databases. This will speed up access to the data that is on disk enormously. But when we have so much RAM that the data we use is actually most there, in RAM, then what limits performance? And the answer is, the way we access that RAM.
This is what I wanted to test, how does the possibly inefficient multithreading affect performance, once the bottleneck that disk access represents is out of the way? This is my little experiment: 100 millions rows in a single table. This is part of my tests to use MySQL as a key value store, so there are no secondary indexes, no foreign keys and as I want to test this with MySQL Cluster, I have also removed any VARCHAR fields. As it stands, the table has one BIGINT one INT and one DOUBLE column, and the BIGINT is the PRIMARY KEY. This is real data though, but I have removed some fields only.
My first test was with a cold database, just started. Noone else was using it and I had an InnoDB cache of 8 Gb using half of what was in the machine. Then I had a benchmark that reads 1 millions rows from this table, using the BIGINT column for lookup, and the 1 IDs to look up are spread over 30 threads (I have 8 cores in this machine). With a cold database, it took 869 seconds to read all this data. The next round, with data in memory, it was a lot faster, but not as fast as you might think, just about half the time, 432 seconds. Now, during that last round, show status showed a lot of "Waiting for query cache lock", so I decided to try without the query cache? Fast or slower, what do you think. The answer is faster. A LOT faster! some 30 seconds! As compared to 432 seconds! This calls for further investigation, but my initial response to this is: If you use MySQL in a way that your data in in memory, then the query_cache isn't just useless, it is severely limiting performance! But I'll have some more data soon on this, this is getting fun!
/Karlsson
PlanetMySQL Voting: Vote UP / Vote DOWN