Quantcast
Channel: Planet MySQL
Viewing all 18800 articles
Browse latest View live

Webinar Thursday, December 7, 2017: Percona XtraDB Cluster (PXC) 101

$
0
0
Percona XtraDB Cluster

Percona XtraDB ClusterJoin Percona’s Software Engineer (PXC Lead), Krunal Bauskar as he presents Percona XtraDB Cluster 101 on Thursday, December 7, 2017, at 7:00 am PST / 10:00 am EST (UTC-8).

Tags: Percona XtraDB Cluster, MySQL, High Availability, Clustering

Experience Level: Beginner

Percona XtraDB Cluster (PXC) is a multi-master solution that offers virtual synchronous replication among clustering node. It is based on the Codership Galera replication library. In this session, we will explore some key features of Percona XtraDB Cluster that make it enterprise ready including some recently added 5.7 exclusive features.

This webinar is an introductory and will cover the following topics:

  • ProxySQL load balancer
  • Multi-master replication
  • Synchronous replication
  • Data at rest encryption
  • Improved SST Security through simplified configuration
  • Easy to setup encrypted between-nodes communication
  • ProxySQL-assisted Percona XtraDB Cluster maintenance mode
  • Automatic node provisioning
  • Percona XtraDB Cluster “strict-mode”

Register for the webinar now.

Percona XtraDB ClusterKrunal Bauskar, C/C++ Engineer

Krunal joined Percona in September 2015. Before joining Percona he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work besides a lot of other features. In the past, he was associated with Yahoo! Labs researching on big data problems and database startup which is now part of Teradata. His interest mainly includes data-management at any scale, and he has been practicing it for more than a decade now. He loves to spend time with his family or get involved in social work, unless he is out for some near-by exploration drive. He is located out of Pune, India.


Insert benchmark: in-memory, low-concurrency, fast server

$
0
0
This post explains the insert benchmark with an in-memory and low-concurrency workload for MyRocks, InnoDB and TokuDB. The goal is to understand throughput and efficiency for different storage engines and different versions of MySQL.

tl;dr - for an in-memory, low-concurrency workload
  • MyRocks and InnoDB 5.6 have similar insert rates but InnoDB 5.6 has a better query rate.
  • MyRocks index scans are slower than InnoDB 5.6: 1.23X slower for the PK and 2.15X slower for the secondary indexes. More CPU overhead is the cause.
  • MyRocks is more space efficient. InnoDB uses ~1.6X more space than MyRocks.
  • MyRocks is more write efficient. InnoDB writes ~3X more to storage per insert on the load and ~15X more on the read-write test.
  • MyRocks uses more CPU than InnoDB 5.6: 16% more on the load, 37% more on read-write.
  • InnoDB 8.0 is 10% to 20% slower than InnoDB 5.6 and uses ~20% more CPU than InnoDB 5.6.

Configuration

The insert benchmark is described here. The test server has 48 HW threads, fast SSD and 256gb of RAM. The database block cache (buffer pool) was large enough to cache the database. The test was run with 1 table and 1 query client. For the read-write tests there is a writer client that runs concurrent with the query client. The insert benchmark loaded the table with 100M rows, then did a full scan of each index (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second with one client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used. 
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

Load

The graphs below have rates for each configuration relative to the rate for InnoDB 5.6.35. The graphs show the insert rate, the CPU overhead per insert and the storage write rate per insert.
  • MyRocks has the best insert rate. It gets 1.06X more than InnoDB 5.6.35.
  • MyRocks has better write efficiency. InnoDB writes ~3X more to storage per insert.
  • MyRocks uses 1.16X more CPU/insert than InnoDB 5.6.35. I assume the extra CPU use is from background tasks (compaction).
  • There is a regression from 5.6 to 8.0 for InnoDB as InnoDB 8.0.3 gets 0.78X the insert rate and uses 1.21X the CPU compared to InnoDB 5.6.35. I assume this is from new code above the storage engine.

Size

This is the database size at the end of the load. The values are absolute and rounded to the nearest GB. MyRocks is more space efficient than InnoDB. Uncompressed InnoDB uses 1.6X more space than uncompressed MyRocks.

Scan

This graph shows the number of seconds to scan the PK index and all of the secondary indexes. The number for the PK scan is from the second scan of it.
  • MyRocks is 1.23X slower on the PK scan and 2.15X slower on the secondary scans compared to InnoDB 5.6.35. MyRocks uses more CPU for these scans based on vmstat data for q5 (the second PK scan) and q4 (one of the secondary scans) - the Mcpu/o column is the CPU overhead per row fetched.
  • Scans are faster for InnoDB starting in 5.7. InnoDB 5.6 does the PK scan 1.4X slower and the secondary scans 1.2X slower.
  • I don't know why the secondary scan for InnoDB from FB MySQL is slower than from upstream. There was a perf bug we added, and recently fixed, for concurrent secondary scans.


Read-write, 1000 inserts/second

This section has results for the read-write test when the writer is limited to 1000 inserts/second. The first graph has the QPS for short range queries. The second graph has the KB written to storage per insert. Both use values relative to the value for InnoDB 5.6.35.
  • InnoDB 5.6.35 has the best QPS.
  • MyRocks gets 0.79X the QPS compared to InnoDB 5.6.35. The problem is CPU overhead which will be obvious in the next section (read-write, 100 writes/second).
  • MyRocks is more write efficient. InnoDB 5.7 and 8.0 write ~15X more to storage per insert than MyRocks. The rate for InnoDB 5.6.35 is an outlier and I think the cause is furious flushing.
  • InnoDB 8.0.3 gets 0.87X the QPS compared to InnoDB 5.6.35. The regression here is smaller than the regression for the load.



Read-write, 100 inserts/second

This section has results for the read-write test when the writer is limited to 100 inserts/second. The first graph has the QPS for short range queries. The second graph has the CPU overhead per query. Both use values relative to the value for InnoDB 5.6.35. Results are similar to the previous section.
  • InnoDB 5.6.35 has the best QPS.
  • MyRocks gets 0.75X the QPS compared to InnoDB 5.6.35. MyRocks uses 1.37X more CPU/query.
  • InnoDB 8.0.3 gets 0.85X the QPS compared to InnoDB 5.6.35. It uses 1.19X more CPU/query than InnoDB 5.6.35. I assume the problem is new code above the storage engine.


Back to Sharing stuff I learned

$
0
0

I have not been regular in blog posts as I've just been focused on everything. I got lazy. Well, that is over.

At Shots Studios, a teen social network consisting of nearly 2M lines of code is no more. Shots is now a one-stop shop for select Creators. We are a Production Studio, Ad/Talent Agency, Talent Management Media company focused on creating timeless content. A 21st-century answer to getting great content from great creators in front of their audience.

Your internal monologue after reading this is how does this have anything to do with MySQL, HA, Scale, Coding; if not, this is still a good segway to explain how.

Shots the App, did really well yet not well enough to compete with Snapchat and Instagram. We did gain a lot of insight, mainly in what is called Influencers. A large percentage of time in growing the Shots platform was handling their cases of spikey scale. When Influencers posted they would promote their Selfie on other platforms sending waves of teens all at once to their data. Honestly, this was an amazing challenge to scale on a tight budget. Cold to Performant in millisecond time, with a 600% increase in load/concurrency suddenly. The short answer to scale this was to keep data in memory - From this, we understood that influencers reach and ability to move users is more effective than Display Ads. Period.

We did a huge analysis about our user base, and from that analysis, we made the decision to keep all "Influencers" in memory, and people who were the sticky users-the percentage of DAU that comes back with frequency. Next, to make sure that we did not saturate a network interface by keeping their data in memory on a single box, we replicated this subset of users among redundant pairs. Finally, we had to keep higher than normal frontends in reserve to handle the sudden burst without the startup delta of dynamic scaling pools.

Now we use a subset of the tech developed to mine, analyze, data about Creators. Creators, were influencers but now create, perform, direct, edit content thus they are called Creators. For instance, we use a custom performant event tracking system to monitor the social engagement of all creators. If you heard of a site called socialblade, I basically duplicated it at a much higher precision then their data.

With this we are able to tell which of a creator's content strikes a chord with users then we produce more of that performant content. For instance, https://shots.com/superheroes. With this insight, analysis, data collection and maximizing the reach channels on platforms like YouTube, Instagram with a shoestring budget we are making data-rich informed decisions.

Insert benchmark: in-memory, high-concurrency, fast server

$
0
0
This post explains the insert benchmark with an in-memory and high-concurrency workload for MyRocks, InnoDB and TokuDB. The goal is to understand throughput and efficiency for different storage engines and different versions of MySQL. The previous test used a low-concurrency, in-memory workload with 1 query client and 1 table. This test uses 16 concurrent clients and 16 tables.

tl;dr - for an in-memory, high-concurrency workload
  • InnoDB 5.7 has the best insert rate. The rate for MyRocks is better than InnoDB 5.6 but we have work to do to close the gap with modern InnoDB.
  • InnoDB 5.6 has the best query rates. MyRocks gets ~0.80X the QPS compared to it.
  • MyRocks index scans are slower than InnoDB 5.6: 1.54X slower for the PK and 2.35X slower for the secondary indexes. More CPU overhead is the cause.
  • MyRocks is more space efficient. InnoDB uses ~1.6X more space than MyRocks.
  • MyRocks is more write efficient. InnoDB writes ~3X more to storage per insert on the load and ~10X more on the read-write test.
  • MyRocks uses 1.24X more CPU than InnoDB 5.6 on queries but less CPU on loads
  • InnoDB 8.0 is 2.4X faster than InnoDB 5.6 on loads but gets 0.89X the QPS of InnoDB 5.6 on queries because it used 1.11X more CPU.

Configuration

The insert benchmark is described here. The test server has 48 HW threads, fast SSD and 256gb of RAM. The database block cache (buffer pool) was large enough to cache the database. The test was run with 16 tables and 16 query clients. For the read-write tests there is an additional writer client for each query client. The insert benchmark loaded the tables with 250M rows (15,625,000 rows/table), then did a full scan of each index (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second per writer client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second per writer client. The scan and read-write tests use a client per table. With 16 tables there are 16 concurrent clients.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used. 
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

Load

The load is concurrent and there are 16 clients each loading a separate table . The graphs below have rates for each configuration relative to the rate for InnoDB 5.6.35. The graphs show the insert rate, the CPU overhead per insert and the storage write rate per insert.
  • InnoDB 5.7 has the best insert rate and is 2.47X better than InnoDB 5.6. MyRocks is 1.51X better than InnoDB 5.6.
  • MyRocks has the best CPU efficiency. 
  • MyRocks has better write efficiency. InnoDB writes ~3X more to storage per insert.
  • InnoDB from FB MySQL does much better than upstream 5.6. I assume several changes account for the improvement.

Size

This is the database size at the end of the load. The values are absolute and rounded to the nearest GB. MyRocks is more space efficient than InnoDB. Uncompressed InnoDB uses 1.6X more space than uncompressed MyRocks.

Scan

This graph shows the number of seconds to scan the PK index and all of the secondary indexes. The number for the PK scan is from the second scan of it. The x-axis truncates the result for FbInno5635 on the secondary index scan. That result was lousy due to a bug that has been fixed. The scans are concurrent and there are 16 clients each scanning indexes from separate tables.
  • MyRocks is 1.54X slower on the PK scan and 2.35X slower on the secondary scans compared to InnoDB 5.6.35. MyRocks uses more CPU for these scans based on vmstat data for q5 (the second PK scan) and q4 (one of the secondary scans) - the Mcpu/o column is the CPU overhead per row fetched.
  • Scans are faster for InnoDB starting in 5.7


Read-write, 1000 inserts/second

This section has results for the read-write test when the writer is limited to 1000 inserts/second. The first graph has the QPS for short range queries. The second graph has the KB written to storage per insert. Both use values relative to the value for InnoDB 5.6.35.
  • InnoDB 5.6.35 has the best QPS
  • MyRocks gets 0.77X the QPS compared to InnoDB 5.6.35. The problem is CPU overhead which will be obvious in the next section (read-write, 100 writes/second).
  • MyRocks is more write efficient. InnoDB writes ~10X more to storage per insert.
  • InnoDB 8.0.3 gets 0.90X the QPS compared to InnoDB 5.6.35. New CPU overhead is the cause.



Read-write, 100 inserts/second

This section has results for the read-write test when the writer is limited to 100 inserts/second. The first graph has the QPS for short range queries. The second graph has the CPU overhead per query. Both use values relative to the value for InnoDB 5.6.35. Results are similar to the previous section.
  • InnoDB 5.6.35 has the best QPS.
  • MyRocks gets 0.78X the QPS compared to InnoDB 5.6.35. MyRocks uses 1.24X more CPU/query.
  • InnoDB 8.0.3 gets 0.89X the QPS compared to InnoDB 5.6.35. It uses 1.11X more CPU/query than InnoDB 5.6.35. I assume the problem is new code above the storage engine. 



MySQL Enterprise Monitor 3.3.7 has been released

$
0
0

We are pleased to announce that MySQL Enterprise Monitor 3.3.7 is now available for download on the My Oracle Support (MOS) web site. This is a maintenance release that includes a few new features and fixes a number of bugs. You can find more information on the contents of this release in the change log.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

Important: MySQL Enterprise Monitor (MEM) 4.0 offers many significant improvements over MEM 3.3 and 3.4, and we highly recommend that you consider upgrading. More information on MEM 4.0 is available here:

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

Insert benchmark: in-memory, high-concurrency, fast server - part 2

$
0
0
This is similar to the previous insert benchmark result for in-memory and high-concurrency except it uses 1 table rather than 16 to determine how a storage engine behaves with more contention. The results for 16 vs 1 table are more interesting on the IO-bound test where there are more stalls in the 1-table results.

One example of performance lost from contention is the per-index mutex for InnoDB which is locked during pessimistic changes to the B-Tree. I know this has been improved over the years but the problem has not been eliminated.

Configuration

Start by reading my previous post. The test still uses 500M rows but there is only one table here when the previous test used 16 tables. The load test still uses 16 concurrent clients. The read-write test still uses 16 read clients and 16 write clients. But the scan test uses 1 client here versus 16 clients on the previous test and the scan test takes longer to finish.

Results

All of the data for the tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

For most of the results below I compare rates for this test with rates for the 16-table test and skip the graphs that show HW efficiency metrics.

Load

This is interesting:
  • Some engines get more inserts/second with 16 tables - 1.12X more for MyRocks, 1.20X more for InnoDB 5.7, 1.17X more for InnoDB 8.0 and 3.26X more for TokuDB
  • InnoDB 5.6 gets more inserts/second with 1 table - 1.04X more for FB MySQL and 1.14X more for upstream



Scan

Scan results for 1 table are similar to scan results for 16 tables. The MyRocks scans are ~2X slower than InnoDB and InnoDB scans got faster with 5.7.


Read-write with 1000 inserts/second

The QPS for 1 table is similar to the QPS for 16 tables. I didn't mention this on the previous test but the 16 concurrent writers should sustain ~16,000 inserts/second. If they don't then the engine has a performance problem. For this test using 1 table, the October 16 build of MyRocks didn't sustain the target write rate. The average rate for it was 15677 while other engines get 15842 or better and the data is in the ips.av column here. Note that the max that my ibench client code will sustain is ~15845/second rather than 16,000 and I have yet to fix that. Regardless I will look at this the next time I run the test to understand whether MyRocks has a problem.


Read-write with 100 inserts/second

The QPS for 1 table is similar to the QPS for 16 tables.

MySQL 8.0: Query Optimizer Takes Data Buffering into Account

$
0
0

In earlier versions of MySQL, the query optimizer did not distinguish between data that was cached in the database buffer and data that had to be read from disk. The main reason was that the optimizer had no information about whether a table would have to be (partially) read from disk or already was present in the buffer pool.…

cat in a loop !

$
0
0
This is strictly not a MySQL post but I was trying to combine output of a query from different shards (databases). I used below CAT command

cat abc.*.out | grep blah > abc.xx.out

And it never came back until I killed with Ctrl+C

I checked files again and abc.xx.out was there with a huge size of  62 GB. If I sum up the sizes of the original files that I was trying to combine then it shouldn't even cross a 100 MB.

So it was obvious now that the cat went into a recursive loop where it started to stream-in from the output file being generated ! which will be a never ending process until the file system goes full 

This may be a well known fact, but I was taken to surprise that cat can get into a recursive loop, and thought I will share here.

This is just a copy from my console to show what I went through. The my.sql is my SQL file while I ran across multiple shards to produce the output file in my.sql_uat_xx.out format. uat_xx are the shards aka databases.




-bash-3.2$ ls -ltr log | tail
-rw------- 1 pvenu ldap-users     2082140 Dec  7 05:48 my.sql_uat_07.out
-rw------- 1 pvenu ldap-users     2070730 Dec  7 05:48 my.sql_uat_08.out
-rw------- 1 pvenu ldap-users     2071257 Dec  7 05:48 my.sql_uat_09.out
-rw------- 1 pvenu ldap-users     2061772 Dec  7 05:48 my.sql_uat_10.out
-rw------- 1 pvenu ldap-users     2039862 Dec  7 05:48 my.sql_uat_11.out
-rw------- 1 pvenu ldap-users     2065411 Dec  7 05:48 my.sql_uat_12.out
-rw------- 1 pvenu ldap-users     2064539 Dec  7 05:48 my.sql_uat_13.out
-rw------- 1 pvenu ldap-users     2073953 Dec  7 05:48 my.sql_uat_14.out
-rw------- 1 pvenu ldap-users     2060721 Dec  7 05:48 my.sql_uat_15.out


-bash-3.2$ cat log/my.sql_uat_*.out | grep -v field_name > log/my.sql_uat_xx.out




{Ctrl + C} after long wait 

-bash-3.2$
-bash-3.2$
-bash-3.2$ ls -ltr log | tail
-rw------- 1 pvenu ldap-users     2082140 Dec  7 05:48 my.sql_uat_07.out
-rw------- 1 pvenu ldap-users     2070730 Dec  7 05:48 my.sql_uat_08.out
-rw------- 1 pvenu ldap-users     2071257 Dec  7 05:48 my.sql_uat_09.out
-rw------- 1 pvenu ldap-users     2061772 Dec  7 05:48 my.sql_uat_10.out
-rw------- 1 pvenu ldap-users     2039862 Dec  7 05:48 my.sql_uat_11.out
-rw------- 1 pvenu ldap-users     2065411 Dec  7 05:48 my.sql_uat_12.out
-rw------- 1 pvenu ldap-users     2064539 Dec  7 05:48 my.sql_uat_13.out
-rw------- 1 pvenu ldap-users     2073953 Dec  7 05:48 my.sql_uat_14.out
-rw------- 1 pvenu ldap-users     2060721 Dec  7 05:48 my.sql_uat_15.out
-rw------- 1 pvenu ldap-users 62260314112 Dec  7 05:54 my.sql_uat_xx.out


this is solved if I change the output file name not to match the input pattern, or use a different directory to write the output file.

PS.
You can write this alternate ways  
cat abc.*.out | grep blah 

 better is 

grep blah abc.*.out

Insert benchmark: IO-bound, high-concurrency, fast server

$
0
0
This post explains the insert benchmark with an IO-bound and high-concurrency workload for MyRocks, InnoDB and TokuDB. The goal is to understand throughput and efficiency for different storage engines and different versions of MySQL. The previous tests used an in-memory workload.

tl;dr - for an IO-bound, high-concurrency workload
  • MyRocks gets ~3X more inserts/s vs InnoDB 5.6 and ~2X more vs InnoDB 5.7/8.0
  • MyRocks has the best QPS. MyRocks gets ~10X and ~4X more than InnoDB on read-write tests because it does less IO per read and per write.
  • MyRocks PK scans are ~2X slower than InnoDB but secondary index scans are almost as fast as InnoDB. Alas, scans get faster in InnoDB 5.7. Readahead helps MyRocks.
  • MyRocks is more space efficient. InnoDB is almost 2X larger than uncompressed MyRocks and almost 4X larger than compressed MyRocks.
  • MyRocks is more write efficient. InnoDB writes ~5X more to storage per insert on the load and ~15X more on the read-write test.

Configuration

The insert benchmark is described here. The test server has 48 HW threads, fast SSD and 50gb of RAM. The database block cache (buffer pool) was set to 10gb for MyRocks and TokuDB and to 35gb for InnoDB. The database was much larger than 50gb. The test was run with 16 tables and 16 query clients. For the read-write tests there is an additional writer client for each query client. The insert benchmark loaded the tables with 2b rows (125M rows/table), then did a full scan of each index (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second per writer client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second per writer client. The scan and read-write tests use a client per table. With 16 tables there are 16 concurrent clients.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Tests were repeated with and without compression. The configuration without compression is called MySQL.none in the rest of this post. The configuration with compression is called MySQL.zstd and used zstandard for the max level, no compression for L0/L1/L2 and lz4 for the other levels.
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Tests were done without compression and then with zlib compression.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

Load

The load is concurrent and there are 16 clients each loading a separate table . The graphs below have rates for each configuration relative to the rate for InnoDB 5.6.35. The graphs show the insert rate, the CPU overhead per insert, the storage write rate per insert and the storage read rate per insert.
  • MyRocks is ~3X faster than InnoDB 5.6 and more than 2X faster than InnoDB 5.7/8.0
  • MyRocks has the best CPU efficiency. InnoDB uses ~3X more CPU/insert.
  • InnoDB from FB MySQL does much better than upstream 5.6.

MyRocks and TokuDB have better write efficiency. InnoDB 5.6 writes ~5X more to storage per insert than uncompressed MyRocks and ~7X more than compressed MyRocks. I don't understand why the write rate is larger for InnoDB 5.7/8.0 than for 5.6.
MyRocks and TokuDB read less from storage per insert. There are two possible reasons for this. First, non-unique secondary index maintenance is read free. Second, the index might be smaller with them and remain in cache. I am not certain which of these explains it.

Size

This is the database size at the end of the load. The values are absolute. MyRocks is more space efficient than InnoDB. InnoDB uses almost 2X more space than uncompressed MyRocks and almost 4X more space than compressed MyRocks.


Scan

This graph shows the scan time for the PK index and and all of the secondary indexes. The value is relative to the time for InnoDB 5.6.35. The absolute value in seconds is here in the scan section. The number for the PK scan is from the second scan of it. The graph excludes FbInno5635 for readability and that version of FbInno5635 has a perf bug for concurrent scans (since fixed). The scans are concurrent and there are 16 clients each scanning indexes from separate tables.

The graph has two extra configurations: MyRocks.none.ra, MyRocks.zstd.ra. Both of these enabled filesystem readahead in RocksDB. For this workload InnoDB suffers from b-tree fragmentation for the secondary indexes but probably not for the PK. That might explain the MyRocks perf results for PK vs secondary scans.
  • MyRocks PK scans are ~2X slower than InnoDB 5.6
  • MyRocks secondary index scans are slightly faster than InnoDB 5.6
  • Readahead is a big deal for MyRocks index scans
  • Scans are faster for InnoDB starting in 5.7


Read-write, 1000 inserts/second

This section has results for the read-write test when the writer is limited to 1000 inserts/second, there are 16 tables and a reader/writer pair per table. The first graph has the QPS for short range queries. The second graph has the CPU/query. Both use values relative to InnoDB 5.6.35. All storage engines sustained the target insert rate of 16,000 rows/second.
  • MyRocks gets more than 6X the QPS compared to InnoDB
  • MyRocks uses less CPU and gets more QPS than InnoDB because it is more read and write efficient
This graph shows iostat read operations per query. The values are relative to InnoDB 5.6.35. The rate for InnoDB is ~10X the rate for uncompressed MyRocks and ~100X the rate for compressed MyRocks. I think the MyRocks indexes are larger than cache so I don't understand why this difference is so large. But I am happy about it.

This graph shows KB written to storage per insert. The values are relative to InnoDB 5.6.35. InnoDB writes ~15X more to storage per insert compared to MyRocks.


Read-write, 100 inserts/second

This section has results for the read-write test when the writer is limited to 100 inserts/second, there are 16 tables and a reader/writer pair per table. The graph has the QPS for short range queries with values relative to the value for InnoDB 5.6.35. MyRocks gets ~4X more QPS than InnoDB here while it got 10X or more in the previous section. The reason is that the write rate is lower on this test, so InnoDB uses less write IO and has more capacity for reads.



Percona Server for MySQL 5.5.58-38.10 is Now Available

$
0
0
percona server 5.5.58-38.10

Percona Server for MySQL 5.5.58-38.10Percona announces the release of Percona Server for MySQL 5.5.58-38.10 on December 7, 2017. Based on MySQL 5.5.58, including all the bug fixes in it, Percona Server for MySQL 5.5.58-38.10 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.58-38.10 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:
  • Percona Server packages are now available for Ubuntu 17.10 (Artful).
Bugs Fixed:
  • If an I/O syscall returned an error during the server shutdown with Thread Pool enabled, a mutex could be left locked. Bug fixed #1702330 (Daniel Black).
  • Dynamic row format feature to support BLOB/VARCHAR in MEMORY tables requires all the key columns to come before any BLOB columns. This requirement however was not enforced, allowing creating MEMORY tables in unsupported column configurations, which then crashed or lose data in usage. Bug fixed #1731483.

Other bugs fixed: #1729241.

Find the release notes for Percona Server for MySQL 5.5.58-38.10 in our online documentation. Report bugs on the launchpad bug tracker.

How Scary is Enabling Semi-Sync Replication?

$
0
0
Semi-sync Replication is a plugin available for mysql which allows you to create more durable replication topologies.  For instance you can ensure that in the event of a master crash that at least one of your replicas has all transaction currently written to the master so that when you promote, you know you're not missing any data.

That's a huge simplification.

What's the downside?  Write speed.  If a transaction on your master have to wait until a replica acknowledges it has that transaction, then there is going to be some delay.  Not only that, but your network latency between the two points matters a lot.  If you want greater durability, the cost is performance.

It's important to note that the master doesn't wait until the replica actually runs the transaction on the replica.  Slave lag in the SQL thread shouldn't cause additional delays.  Only if the IO thread is behind.  The replica only has to receive the transaction and write it to the local relay log before sending an ack.

The questions I immediately had was what happens when you enable semi-sync replication and you have some kind of network issue.  Will all transactions be blocked or delayed?  The short answer is no.  There are configurable timeouts, and some failsafe behavior I can demonstrate below:

Test 1: Short network interruption

I set up a script to do simple inserts once a second to the master and display the time it took.  We can use this to see some of the actual effects failures in semi-sync will have.  I'm using the default timeout of rpl_semi_sync_master_timeout which is 10 seconds.


replica$ sudo ifdown eth1 && sudo ifup eth1

4.121264ms
2.351532ms
1.413117832s ***
2.165793ms
3.381428ms
3.519269ms

The list of times shows that there was a delay > 1 second committing a transaction and then things resumed as normal.  This is exactly what we would expect to happen.

Test 2: Long network interruption


What happens when the problem lasts longer than rpl_semi_sync_master_timeout?  Let's see:

sudo ifdown eth1 && sleep 30 && sudo ifup eth1

3.702486ms
2.737649ms
3.571109ms
10.006403552s ***
14.490646ms
2.409117ms
2.657442ms
3.109171ms
4.0376ms
We can see the delay here which is longer.  It hits the max configured timeout and then continues.  Subsequent transaction do not wait!  What happened?  Once a transaction hits the timeout then the sever falls back to asynchronous mode.  This ensures that extended network problems do not cause the master to come to a grinding halt.  Status variables confirm this:

mysql> show global status like 'rpl%';
+--------------------------------------------+----------+
| Variable_name | Value |
+--------------------------------------------+----------+
| Rpl_semi_sync_master_clients | 0 |
| Rpl_semi_sync_master_net_avg_wait_time | 260203 |
| Rpl_semi_sync_master_net_wait_time | 44234631 |
| Rpl_semi_sync_master_net_waits | 170 |
| Rpl_semi_sync_master_no_times | 3 |
| Rpl_semi_sync_master_no_tx | 41 |
| Rpl_semi_sync_master_status | OFF |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 9141 |
| Rpl_semi_sync_master_tx_wait_time | 1508274 |
| Rpl_semi_sync_master_tx_waits | 165 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 166 |
| Rpl_semi_sync_slave_status | OFF |
+--------------------------------------------+----------+

The value of Rpl_semi_sync_master_status is now OFF, indicating that it's replicating in async mode.  The counter Rpl_semi_sync_master_no_tx records the number of transaction on the master which were not sent to the replica for confirmation.  Graphing these values would show you when the server is in semi-sync vs. async mode.

When the network interruption ends and the server establishes connection with the replica then it resumes semi-sync replication:

mysql> show global status like 'rpl%';
+--------------------------------------------+----------+
| Variable_name | Value |
+--------------------------------------------+----------+
| Rpl_semi_sync_master_clients | 1 |
| Rpl_semi_sync_master_net_avg_wait_time | 236667 |
| Rpl_semi_sync_master_net_wait_time | 44256850 |
| Rpl_semi_sync_master_net_waits | 187 |
| Rpl_semi_sync_master_no_times | 3 |
| Rpl_semi_sync_master_no_tx | 47 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 8433 |
| Rpl_semi_sync_master_tx_wait_time | 1517941 |
| Rpl_semi_sync_master_tx_waits | 180 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 181 |
| Rpl_semi_sync_slave_status | OFF |
+--------------------------------------------+----------+

Conclusion


Semi-sync replication is designed for failure.  Configurable timeouts let you choose what the maximum impact is that you're willing to tolerate on your master's writes.  This will allow me to deploy this to a production server with a very low timeout such as 50ms to begin.  Then you can monitor the convenient status variables to isolate when problems occur and diagnose the causes until we're comfortable with the production impact.

Note: all tests preformed with:  5.6.37-82.2-log

Hands-On Look at ZFS with MySQL

$
0
0
ZFS with MySQL

ZFS with MySQLThis post is a hands-on look at ZFS with MySQL.

In my previous post, I highlighted the similarities between MySQL and ZFS. Before going any further, I’d like you to be able to play and experiment with ZFS. This post shows you how to configure ZFS with MySQL in a minimalistic way on either Ubuntu 16.04 or Centos 7.

Installation

In order to be able to use ZFS, you need some available storage space. For storage – since the goal here is just to have a hands-on experience – we’ll use a simple file as a storage device. Although simplistic, I have now been using a similar setup on my laptop for nearly three years (just can’t get rid of it, it is too useful). For simplicity, I suggest you use a small Centos7 or Ubuntu 16.04 VM with one core, 8GB of disk and 1GB of RAM.

First, you need to install ZFS as it is not installed by default. On Ubuntu 16.04, you simply need to run:

root@Ubuntu1604:~# apt-get install zfs-dkms zfsutils-linux

On RedHat or Centos 7.4, the procedure is a bit more complex. First, we need to install the EPEL ZFS repository:

[root@Centos7 ~]# yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
[root@Centos7 ~]# gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
[root@Centos7 ~]# gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

Apparently, there were issues with ZFS kmod kernel modules on RedHat/Centos. I never had any issues with Ubuntu (and who knows how often the kernel is updated). Anyway, it is recommended that you enable kABI-tracking kmods. Edit the file /etc/yum.repos.d/zfs.repo, disable the ZFS repo and enable the zfs-kmod repo. The beginning of the file should look like:

[zfs]
name=ZFS on Linux for EL7 - dkms
baseurl=http://download.zfsonlinux.org/epel/7.4/$basearch/
enabled=0
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
[zfs-kmod]
name=ZFS on Linux for EL7 - kmod
baseurl=http://download.zfsonlinux.org/epel/7.4/kmod/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
...

Now, we can proceed and install ZFS:

[root@Centos7 ~]# yum install zfs

After the installation, I have ZFS version 0.6.5.6 on Ubuntu and version 0.7.3.0 on Centos7. The version difference doesn’t matter for what will follow.

Setup

So, we need a container for the data. You can use any of the following options for storage:

  • A free disk device
  • A free partition
  • An empty LVM logical volume
  • A file

The easiest solution is to use a file, and so that’s what I’ll use here. A file is not the fastest and most efficient storage, but it is fine for our hands-on. In production, please use real devices. A more realistic server configuration will be discussed in a future post. The following steps are identical on Ubuntu and Centos. The first step is to create the storage file. I’ll use a file of 1~GB in /mnt. Adjust the size and path to whatever suits the resources you have:

[root@Centos7 ~]# dd if=/dev/zero of=/mnt/zfs.img bs=1024 count=1048576

The result is a 1GB file in /mnt:

[root@Centos7 ~]# ls -lh /mnt
total 1,0G
-rw-r--r--.  1 root root 1,0G 16 nov 16:50 zfs.img

Now, we will create our ZFS pool, mysqldata, using the file we just created:

[root@Centos7 ~]# modprobe zfs
[root@Centos7 ~]# zpool create mysqldata /mnt/zfs.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
config:
        NAME            STATE     READ WRITE CKSUM
        mysqldata       ONLINE       0     0     0
          /mnt/zfs.img  ONLINE       0     0     0
errors: No known data errors
[root@Centos7 ~]# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
mysqldata  79,5K   880M    24K  /mysqldata

If you have a result similar to the above, congratulations, you have a ZFS pool. If you put files in /mysqldata, they are in ZFS.

MySQL installation

Now, let’s install MySQL and play around a bit. We’ll begin by installing the Percona repository:

root@Ubuntu1604:~# cd /tmp
root@Ubuntu1604:/tmp# wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
root@Ubuntu1604:/tmp# dpkg -i percona-release_*.deb
root@Ubuntu1604:/tmp# apt-get update
[root@Centos7 ~]# yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm

Next, we install Percona Server for MySQL 5.7:

root@Ubuntu1604:~# apt-get install percona-server-server-5.7
root@Ubuntu1604:~# systemctl start mysql
[root@Centos7 ~]# yum install Percona-Server-server-57
[root@Centos7 ~]# systemctl start mysql

The installation command pulls all the dependencies and sets up the MySQL root password. On Ubuntu, the install script asks for the password, but on Centos7 a random password is set. To retrieve the random password:

[root@Centos7 ~]# grep password /var/log/mysqld.log
2017-11-21T18:37:52.435067Z 1 [Note] A temporary password is generated for root@localhost: XayhVloV+9g+

The following step is to reset the root password:

[root@Centos7 ~]# mysql -p -e "ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mysql57OnZfs_';"
Enter password:

Since 5.7.15, the password validation plugin by defaults requires a length greater than 8, mixed cases, at least one digit and at least one special character. On either Linux distributions, I suggest you set the credentials in the /root/.my.cnf file like this:

[# cat /root/.my.cnf
[client]
user=root
password=Mysql57OnZfs_

MySQL configuration for ZFS

Now that we have both ZFS and MySQL, we need some configuration to make them play together. From here, the steps are the same on Ubuntu and Centos. First, we stop MySQL:

# systemctl stop mysql

Then, we’ll configure ZFS. We will create three ZFS filesystems in our pool:

  • mysql will be the top level filesystem for the MySQL related data. This filesystem will not directly have data in it, but data will be stored in the other filesystems that we create. The utility of the mysql filesystem will become obvious when we talk about snapshots. Something to keep in mind for the next steps, the properties of a filesystem are by default inherited from the upper level.
  • mysql/data will be the actual datadir. The files in the datadir are mostly accessed through random IO operations, so we’ll set the ZFS recordsize to match the InnoDB page size.
  • mysql/log will be where the log files will be stored. By log files, I primarily mean the InnoDB log files. But the binary log file, the slow query log and the error log will all be stored in that directory. The log files are accessed through sequential IO operations. We’ll thus use a bigger ZFS recordsize in order to maximize the compression efficiency.

Let’s begin with the top-level MySQL container. I could have used directly mysqldata, but that would somewhat limit us. The following steps create the filesystem and set some properties:

# zfs create mysqldata/mysql
# zfs set compression=gzip mysqldata/mysql
# zfs set recordsize=128k mysqldata/mysql
# zfs set atime=off mysqldata/mysql

I just set compression to ‘gzip’ (the equivalent of gzip level 6), recordsize to 128KB and atime (the file’s access time) to off. Once we are done with the mysql filesystem, we can proceed with the data and log filesystems:

# zfs create mysqldata/mysql/log
# zfs create mysqldata/mysql/data
# zfs set recordsize=16k mysqldata/mysql/data
# zfs set primarycache=metadata mysqldata/mysql/data
# zfs get compression,recordsize,atime mysqldata/mysql/data
NAME                  PROPERTY     VALUE     SOURCE
mysqldata/mysql/data  compression  gzip      inherited from mysqldata/mysql
mysqldata/mysql/data  recordsize   16K       local
mysqldata/mysql/data  atime        off       inherited from mysqldata/mysql

Of course, there are other properties that could be set, but let’s keep things simple. Now that the filesystems are ready, let’s move the files to ZFS (make sure you stopped MySQL):

# mv /var/lib/mysql/ib_logfile* /mysqldata/mysql/log/
# mv /var/lib/mysql/* /mysqldata/mysql/data/

and then set the real mount points:

# zfs set mountpoint=/var/lib/mysql mysqldata/mysql/data
# zfs set mountpoint=/var/lib/mysql-log mysqldata/mysql/log
# chown mysql.mysql /var/lib/mysql /var/lib/mysql-log

Now we have:

# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
mysqldata             1,66M   878M  25,5K  /mysqldata
mysqldata/mysql       1,54M   878M    25K  /mysqldata/mysql
mysqldata/mysql/data   890K   878M   890K  /var/lib/mysql
mysqldata/mysql/log    662K   878M   662K  /var/lib/mysql-log

We must adjust the MySQL configuration accordingly. Here’s what I put in my /etc/my.cnf file (/etc/mysql/my.cnf on Ubuntu):

[mysqld]
datadir=/var/lib/mysql
innodb_log_group_home_dir = /var/lib/mysql-log
innodb_doublewrite = 0
innodb_checksum_algorithm = none
slow_query_log = /var/lib/mysql-log/slow.log
log-error = /var/lib/mysql-log/error.log
server_id = 12345
log_bin = /var/lib/mysql-log/binlog
relay_log=/var/lib/mysql-log/relay-bin
expire_logs_days=7
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
pid-file=/var/run/mysqld/mysqld.pid

On Centos 7, selinux prevented MySQL from accessing files in /var/lib/mysql-log. I had to perform the following steps:

[root@Centos7 ~]# yum install policycoreutils-python
[root@Centos7 ~]# semanage fcontext -a -t mysqld_db_t "/var/lib/mysql-log(/.*)?"
[root@Centos7 ~]# chcon -Rv --type=mysqld_db_t /var/lib/mysql-log/

I could have just disabled selinux since it is a test server, but if I don’t get my hands dirty on selinux once in a while with semanage and chcon I will not remember how to do it. Selinux is an important security tool on Linux (but that’s another story).

At this point, feel free to start using your test MySQL database on ZFS.

Monitoring ZFS

To monitor ZFS, you can use the zpool command like this:

[root@Centos7 ~]# zpool iostat 3
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
mysqldata   19,6M   988M      0      0      0    290
mysqldata   19,3M   989M      0     44      0  1,66M
mysqldata   23,4M   985M      0     49      0  1,33M
mysqldata   23,4M   985M      0     40      0   694K
mysqldata   26,7M   981M      0     39      0   561K
mysqldata   26,7M   981M      0     37      0   776K
mysqldata   23,8M   984M      0     43      0   634K

This shows the ZFS activity while I was loading some data. Also, the following command gives you an estimate of the compression ratio:

[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql
NAME             PROPERTY       VALUE  SOURCE
mysqldata/mysql  compressratio  4.10x  -
mysqldata/mysql  used           116M   -
mysqldata/mysql  logicalused    469M   -
[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql/data
NAME                  PROPERTY       VALUE  SOURCE
mysqldata/mysql/data  compressratio  4.03x  -
mysqldata/mysql/data  used           67,9M  -
mysqldata/mysql/data  logicalused    268M   -
[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql/log
NAME                 PROPERTY       VALUE  SOURCE
mysqldata/mysql/log  compressratio  4.21x  -
mysqldata/mysql/log  used           47,8M  -
mysqldata/mysql/log  logicalused    201M   -

In my case, the dataset compresses very well (4x). Another way to see how files are compressed is to use ls and du. ls returns the actual uncompressed size of the file, while du returns the compressed size. Here’s an example:

[root@Centos7 mysql]# -lah ibdata1
-rw-rw---- 1 mysql mysql 90M nov 24 16:09 ibdata1
[root@Centos7 mysql]# du -hs ibdata1
14M     ibdata1

I really invite you to further experiment and get a feeling of how ZFS and MySQL behave together.

Snapshots and backups

A great feature of ZFS that work really well with MySQL are snapshots. A snapshot is a consistent view of the filesystem at a given point in time. Normally, it is best to perform a snapshot while a flush tables with read lock is held. That allows you to record the master position, and also to flush MyISAM tables. It is quite easy to do that. Here’s how I create a snapshot with MySQL:

[root@Centos7 ~]# mysql -e 'flush tables with read lock;show master status;! zfs snapshot -r mysqldata/mysql@my_first_snapshot'
+---------------+-----------+--------------+------------------+-------------------+
| File          | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+-----------+--------------+------------------+-------------------+
| binlog.000002 | 110295083 |              |                  |                   |
+---------------+-----------+--------------+------------------+-------------------+
[root@Centos7 ~]# zfs list -t snapshot
NAME                                     USED  AVAIL  REFER  MOUNTPOINT
mysqldata/mysql@my_first_snapshot          0B      -    24K  -
mysqldata/mysql/data@my_first_snapshot     0B      -  67,9M  -
mysqldata/mysql/log@my_first_snapshot      0B      -  47,8M  -

The command took about 1s. The only time where such commands would take more time is when there are MyISAM tables with a lot of pending updates to the indices, or when there are long running transactions. You probably wonder why the “USED” column reports 0B. That’s simply because there were no changes to the filesystem since the snapshot was created. It is a measure of the amount of data that hasn’t been free because the snapshot requires the data. Said otherwise, it how far the snapshot has diverged from its parent. You can access the snapshot through a clone or through ZFS as a file system. To access the snapshot through ZFS, you have to set the snapdir parameter to “visible, ” and then you can see the files. Here’s how:

[root@Centos7 ~]# zfs set snapdir=visible mysqldata/mysql/data
[root@Centos7 ~]# zfs set snapdir=visible mysqldata/mysql/log
[root@Centos7 ~]# ls /var/lib/mysql-log/.zfs/snapshot/my_first_snapshot/
binlog.000001  binlog.000002  binlog.index  error.log  ib_logfile0  ib_logfile1

The files in the snapshot directory are read-only. If you want to be able to write to the files, you first need to clone the snapshots:

[root@Centos7 ~]# zfs create mysqldata/mysqlslave
[root@Centos7 ~]# zfs clone mysqldata/mysql/data@my_first_snapshot mysqldata/mysqlslave/data
[root@Centos7 ~]# zfs clone mysqldata/mysql/log@my_first_snapshot mysqldata/mysqlslave/log
[root@Centos7 ~]# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
mysqldata                   116M   764M    26K  /mysqldata
mysqldata/mysql             116M   764M    24K  /mysqldata/mysql
mysqldata/mysql/data       67,9M   764M  67,9M  /var/lib/mysql
mysqldata/mysql/log        47,8M   764M  47,8M  /var/lib/mysql-log
mysqldata/mysqlslave         28K   764M    26K  /mysqldata/mysqlslave
mysqldata/mysqlslave/data     1K   764M  67,9M  /mysqldata/mysqlslave/data
mysqldata/mysqlslave/log      1K   764M  47,8M  /mysqldata/mysqlslave/log

At this point, it is up to you to use the clones to spin up a local slave. Like for the snapshots, the clone only grows in size when actual data is written to it. ZFS records that haven’t changed since the snapshot was taken are shared. That’s a huge space savings. For a customer, I once wrote a script to automatically create five MySQL slaves for their developers. The developers would do tests, and often replication broke. Rerunning the script would recreate fresh slaves in a matter of a few minutes. My ZFS snapshot script and the script I wrote to create the clone based slaves are available here: https://github.com/y-trudeau/Yves-zfs-tools

Optional features

In the previous post, I talked about a SLOG device for the ZIL and the L2ARC, a disk extension of the ARC cache. If you promise to never use the following trick in production, here’s how to speed MySQL on ZFS drastically:

[root@Centos7 ~]# dd if=/dev/zero of=/dev/shm/zil_slog.img bs=1024 count=131072
131072+0 enregistrements lus
131072+0 enregistrements écrits
134217728 octets (134 MB) copiés, 0,373809 s, 359 MB/s
[root@Centos7 ~]# zpool add mysqldata log /dev/shm/zil_slog.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
config:
        NAME                     STATE     READ WRITE CKSUM
        mysqldata                ONLINE       0     0     0
          /mnt/zfs.img           ONLINE       0     0     0
        logs
          /dev/shm/zil_slog.img  ONLINE       0     0     0
errors: No known data errors

The data in the SLOG is critical for ZFS recovery. I performed some tests with virtual machines, and if you crash the server and lose the SLOG you may lose all the data stored in the ZFS pool. Normally, the SLOG is on a mirror in order to lower the risk of losing it. The SLOG can be added and removed online.

I know I asked you to promise to never use an shm file as SLOG in production. Actually, there are exceptions. I would not hesitate to temporarily use such a trick to speed up a lagging slave. Another situation where such a trick could be used is with Percona XtraDB Cluster. With a cluster, there are multiple copies of the dataset. Even if one node crashed and lost its ZFS filesystems, it could easily be reconfigured and reprovisioned from the surviving nodes.

The other optional feature I want to cover is a cache device. The cache device is what is used for the L2ARC. The content of the L2ARC is compressed as the original data is compressed. To add a cache device (again an shm file), do:

[root@Centos7 ~]# dd if=/dev/zero of=/dev/shm/l2arc.img bs=1024 count=131072
131072+0 enregistrements lus
131072+0 enregistrements écrits
134217728 octets (134 MB) copiés, 0,272323 s, 493 MB/s
[root@Centos7 ~]# zpool add mysqldata cache /dev/shm/l2arc.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
config:
    NAME                     STATE     READ WRITE CKSUM
    mysqldata                ONLINE       0     0     0
      /mnt/zfs.img           ONLINE       0     0     0
    logs
      /dev/shm/zil_slog.img  ONLINE       0     0     0
    cache
      /dev/shm/l2arc.img     ONLINE       0     0     0
errors: No known data errors

To monitor the L2ARC (and also the ARC), look at the file: /proc/spl/kstat/zfs/arcstats. As the ZFS filesystems are configured right now, very little will go to the L2ARC. This can be frustrating. The reason is that the L2ARC is filled by the elements evicted from the ARC. If you recall, we have set primarycache=metatdata for the filesystem containing the actual data. Hence, in order to get some data to our L2ARC, I suggest the following steps:

[root@Centos7 ~]# zfs set primarycache=all mysqldata/mysql/data
[root@Centos7 ~]# echo 67108864 > /sys/module/zfs/parameters/zfs_arc_max
[root@Centos7 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@Centos7 ~]# grep '^size' /proc/spl/kstat/zfs/arcstats
size                            4    65097584

It takes the echo command to drop_caches to force a re-initialization of the ARC. Now, InnoDB data starts to be cached in the L2ARC. The way data is sent to the L2ARC has many tunables, which I won’t discuss here. I chose 64MB for the ARC size mainly because I am using a low memory VM. A size of 64MB is aggressively small and will slow down ZFS if the metadata doesn’t fit in the ARC. Normally you should use a larger value. The actual good size depends on many parameters like the filesystem system size, the number of files and the presence of a L2ARC. You can monitor the ARC and L2ARC using the arcstat tool that comes with ZFS on Linux (when you use Centos 7). With Ubuntu, download the tool from here.

Removal

So the ZFS party is over? We need to clean up the mess! Let’s begin:

[root@Centos7 ~]# systemctl stop mysql
[root@Centos7 ~]# zpool remove /dev/shm/l2arc.img
[root@Centos7 ~]# zpool remove mysqldata /dev/shm/zil_slog.img
[root@Centos7 ~]# rm -f /dev/shm/*.img
[root@Centos7 ~]# zpool destroy mysqldata
[root@Centos7 ~]# rm -f /mnt/zfs.img
[root@Centos7 ~]# yum erase spl kmod-spl libzpool2 libzfs2 kmod-zfs zfs

The last step is different on Ubuntu:

root@Ubuntu1604:~# apt-get remove spl-dkms zfs-dkms libzpool2linux libzfs2linux spl zfsutils-linux zfs-zed

Conclusion

With this guide, I hope I provided a positive first experience in using ZFS with MySQL. The configuration is simple, and not optimized for performance. However, we’ll look at more realistic configurations in future posts.

Insert benchmark: IO-bound, high-concurrency, fast server, part 2

$
0
0
This is similar to the previous insert benchmark result for IO-bound and high-concurrency except it uses 1 table rather than 16 to determine how a storage engine behaves with more contention.

tl;dr
  • Inserts are much faster for MyRocks
  • The InnoDB PK uses 2X more space for the 1 table test than the 16 table test. I filed bug 88827.
  • MyRocks secondary index scans have a similar performance to InnoDB
  • MyRocks PK scans are ~2X slower than InnoDB 5.6 on the 16 table test but ~3X faster on the 1 table test. This might also be bug 88827.

Configuration

Start by reading my previous post. The test still uses 2B rows but there is only one table here when the previous test used 16 tables. The load still uses 16 concurrent clients. The read-write test still uses 16 read clients and 16 write clients. But the scan test uses 1 client here versus 16 clients on the previous test and the scan test takes longer to finish.

While I have results for InnoDB from FB MySQL I exclude them from the graphs to improve readability.

Results

All of the data for the 1-table tests is here and for the 16-table tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

For most of the results below I compare rates for this test with rates for the 16-table test and skip the graphs that show HW efficiency metrics.

Size

This graph shows the database size when the load ends for the 16 table and 1 table tests. For MyRocks and TokuDB the database size is similar for both tests. The InnoDB result is odd because the size is almost 1.25X larger for the 1 table test. From SHOW TABLE STATUS the data_length was about 2X larger for the 1 table test. From iostat output the PK scan for the 1 table test reads ~205gb while the 16 table test reads ~125gb. So the PK uses almost 2X more space than it should when there are concurrent inserters to the same table. The inserts are multi-row and the PK is auto-inc so the inserts grow the b-tree to the right. I filed bug 88827 for this.


Load

This graph shows the insert rate for the 16 and 1 table tests. Some engines get more inserts/second with 1 table, others get more with 16 tables:
  • More with 16 tables: ~1.1X more for MyRocks, ~2X more for TokuDB
  • More with 1 table: 1.3X more for InnoDB 5.6, 1.4X more for InnoDB 5.7 and 8.0



Scan

These graphs show the scan times relative to the scan time for InnoDB 5.6.35. A value > 1 means the engine is slower than InnoDB. The first graph is from the 16 table test and the second is from the 1 table test. In both cases the MyRocks secondary index scan is about as fast as InnoDB. But the PK scan MyRocks is more than 2X slower than InnoDB in the 16 table test and 2X to 3X faster than InnoDB for the 1 table test.

What might explain the difference in PK scan times? MyRocks was ~2X slower than InnoDB in the 16 table test and ~3X faster than InnoDB in the 1 table test. That is a change of 6X. The output from vmstat and iostat can help for the 16 table and 1 table tests. This is in the q5 section which is the second scan of the PK and I will compare myrocks.jun16.none with inno5635.

  • First, the InnoDB PK uses ~2X more space for the 1 table test, so there is 2X more data to scan. But with hand waving that should explain only 2X of the 6X change.
  • On the 16 table test InnoDB 5.6 gets ~3X more MB/s of storage reads compared to MyRocks: 2353.7 vs ~828.1. But on the 1 table test InnoDB 5.6 gets less storage read MB/s than MyRocks: 67.4 vs 94.8.
  • One reason for getting less read MB/s from storage is using more CPU and that appears true in this case. The Mcpu/o column has the CPU overhead per row read. For the 16 table test it is 1.278 for InnoDB 5.6 vs 1.100 for MyRocks. On the 1 table test it is 3.547 for InnoDB 5.6 vs 1.807 for MyRocks. So InnoDB is only using ~1.2X more CPU than MyRocks on the 16 table test but ~2X more CPU on the 1 table test.



Verify MySQL Backups with TwinDB Backup Tool

$
0
0

If you don’t verify backups you may safely assume you don’t have them. It happens often that MySQL backups can be invalid or broken due to a software bug or some hidden corruption. If you are lucky enough hours and days will be needed to resurrect a database from a bad backup copy. If you […]

The post Verify MySQL Backups with TwinDB Backup Tool appeared first on TwinDB.

Replication: Been Down So Long it Looks Like Up to Me (with Apologies to the Family of Richard Farina)

$
0
0

semi-sync-00

You as a MySQL administrator/developer are really excited about invisible indexes in Release Candidate Version 8.0 of MySQL. You know that your users place indexes on all columns, even if they are not covered by their queries. These indexes take up so much space! You want to mark some indexes as ‘invisible’ for a test to see if users complain or scream that their queries are slow. Invisible indexes, obviously, mark indexes as ‘invisible’ to the optimizer. If, during your experiment, the users don’t complain about performance, you know these are unused indexes, and it will be safe to drop them, saving valuable space.

Suddenly you get a call from your boss who wants you to do a PowerPoint presentation on how you debug problems, if any, with the in-production MySQL replication system.  She doesn’t want to see Linux commands but wants to see a visual representation on  how the system is running since the team needs to immediately answer the question ‘are we up and running?’,  24/7.  This is the standard ‘Show and Tell’ at the end of each scrum, to let everyone know what you and the rest of the team have accomplished, what you’ve been doing.

You spring to action with the MySQL Enterprise Monitor, knowing you can get back to the invisible index experiment later on. You set up a duplicate of your master slave (replica) system for your end-of-scrum demo, and connect the MySQL Enterprise Monitor to both servers.

First, you want to show that everything is working, the happy path with the MySQL Enterprise Monitor:

semi-sync-01

You have one master with replication going to one replica.  The ‘Legend’ in the bottom right shows ‘Semi-Sync Replication Fetch [is] OK’.  With Semi-Sync replication, the replica lets the master know it has received transaction events after the events are made durable on the replica.

To conduct the experiment for your presentation, you graciously kill off (using a MySQL command) the replication I/O thread that sends transactions from the master to the replica. Now, you show your team the unhappy path, the danger zone:

semi-sync-001a

Note the the MySQL Enterprise Monitor shows a red arrow from the master to replica, indicating that Semi-Sync replication fetch is down.

You show them that the MySQL Enterprise Monitor gives you another clue that something is amiss with replication with an alert from its alerting system. The first alert in yellow shows the ‘Replication I/O Thread [is] Not Running’:

semi-sync-02

Next, you show that the the ‘Query Analyzer’ within the MySQL Enterprise Monitor also shows a graph to indicate that replication has fallen too far behind:

semi-sync-04

For the grand finale of your presentation on ‘how to debug replication issues’, you take the replica offline. Naturally, this is the last chapter to your presentation:

semi-sync-10

The MySQL Enterprise Monitor’s topology legend comes to the rescue again, showing that the replica is down.

After your grand finale, your boss then quizzes everyone to see if the team can keep the topology diagrams as shown in the MySQL Enterprise Monitor in focus on one of the many giant monitors spread around the command center. She asks if there is a notification that can be sent to the group that in case replication is down. You nod your head affirmatively, explaining how the alert system works. Your boss is pleased, the team is fine. The ‘Show and Tell’ presentation of the MySQL Enterprise Monitor is over.

You are thrilled since you can go back to playing around with invisible indexes in the recent Release Candidate Version 8.0 of MySQL.

Please note that MySQL Enterprise Monitor is one part of the MySQL Enterprise Edition: https://www.mysql.com/products/enterprise/ . The MySQL Enterprise Edition, along with the MySQL Monitor, also can be found in the cloud as it is part of Oracle MySQL Cloud Service (MySQLCS) product: https://cloud.oracle.com/mysql . Same product, both on-premise and in the cloud.

The little drawing at the top of the page of the unhappy computer with an ice pack comes from here:  https://www.healthcare-informatics.com/article/be-prepared-lessons-extended-outage-hospital-s-ehr-system

 “The statements and opinions expressed here are my own and do not necessarily represent those of the Oracle Corporation.”

-Kathy Forte, Oracle MySQL Solutions Architect

 



Data Modelling: A Counter Table

$
0
0
A counter table is a table that keeps counts of particular items or for certain keys. This can range from page count on your blog to keep track of a limit the user is allowed to have from a particular item or service.

Usually, a counter table would be better kept in something like Memcached or Redis as frequent increment updates would be better suited to those in-memory systems.

MySQL and Innodb in particular has many stringent systems to make sure that your data has been reliably written to disk. Just going through those systems alone, can make having a counter table, not suitable, not even considering the speed it takes to update the actual table.

However, sometimes there is a need for certain assurances from failure scenarios where in-memory systems may not be suitable for - as when they crash, the data kept in memory is cleared out.

In those cases, may I recommend that you do what I consider a 'aggregate counter table'. The idea here is to replace doing lots of increment updates and simply count the original base table you are interested in having counts for.

In short, instead of:

INSERT INTO base_table;
UPDATE counter_table set value=value+1 where key=key1;


You would do

INSERT INTO base_table;
On interval (like 1 to 5 seconds):
- INSERT INTO counter_table 
- SELECT key1, count(1), max(primarykey) FROM base_table 
- WHERE last_count_position
- GROUP BY key1
- ON DUPLICATE KEY UPDATE value=value+recent_count


In order to be able to aggregate the base_table more correctly, you need to keep some sort of record of what was the last time or position you read for the base table. What I recommend you consider, is either the primary key, assuming its an integer as well as having a last_updated timestamp column.

Below is an example of a counter table that keeps the last id of the primary key it counted from the base table:


CREATE TABLE counter_table (
  key_id int(10) unsigned NOT NULL,
  counts int(10) unsigned DEFAULT '0',
  lastprimary_id int(10) unsigned DEFAULT '0',
  PRIMARY KEY (key_id),
  KEY idx_camp (lastprimary_id)
) ENGINE=InnoDB;


In order to run your 'refresh' query, you would first need to query the counter_table like this:
SELECT max(lastprimary_id) from counter_table;

Then populate the counter table by including in your above INSERT INTO SELECT statement a:
WHERE base_table.primarykey > lastprimary_id


This should be very fast and will prevent the many 'database-attacking update queries' that can become a serious bottleneck to your performance in the long run.


Downsides

This method doesn't factor in if the rows in the base table were UPDATE'd or DELETE'd. It just counts the row number. If this is a requirement, you can revert to using UPDATE statements for:
UPDATE counter_table SET value=value-1
with the understanding that this will happen infrequently.

You also, now need to maintain a procedure and monitor that it is running on the set intervals that you need it. Fortunately, MySQL has scheduled Events to help with that.

MySQL Connector/NET 6.10.5 GA has been released

$
0
0

Dear MySQL users,

MySQL Connector/Net 6.10.5 is the second GA release and the first to
introduce .NET Core 2.0 and Entity Framework Core 2.0 support.

To download MySQL Connector/Net 6.10.5 GA, see the “Generally Available
(GA) Releases” tab at http://dev.mysql.com/downloads/connector/net/


Changes in MySQL Connector/Net 6.10.5 (2017-12-08, General
Availability)

   Functionality Added or Changed

     * Support for .NET Core 2.0 and .NET Standard 2.0 has been
       added (.NET Core 1.1 support continues). With .NET Core 2.0, most
       of the common ADO.NET classes are available for use, such as:

          + System.Data.DataTable, System.Data.DataColumn, and
            System.Data.DataRow

          + System.Data.DataSet

          + System.Data.Common.DataAdapter

     * Support for Entity Framework Core 2.0 has been added
       (Entity Framework 1.1 support continues). Currently, the MySQL
       Connector/Net implementation excludes the following 2.0 features:

          + Modeling: table splitting, owned types, model-level
            query filters, database scalar function mapping,
            self-contained type configuration for code first.

          + High performance: DbContext pooling and explicitly
            compiled queries.

          + Change tracking: attach can track a graph of new and
            existing entities.

          + Query: improved LINQ translation, group-join
            improvements, string interpolation in FromSql and
            ExecuteSqlCommand, new EF.Functions.Like().

          + Database management: pluralization hook for
            DbContext scaffolding.

          + Others: only one provider per model, consolidated
            logging and diagnostics.


Nuget packages are available at:

https://www.nuget.org/packages/MySql.Data/6.10.5
https://www.nuget.org/packages/MySql.Web/6.10.5
https://www.nuget.org/packages/MySql.Data.Entity/6.10.5
https://www.nuget.org/packages/MySql.Data.EntityFrameworkCore/6.10.5
https://www.nuget.org/packages/MySql.Data.EntityFrameworkCore.Design/6.10.5

Enjoy and thanks for the support!

On behalf of the MySQL Release Team,
Nawaz Nazeer Ahamed

Percona Server for MySQL 5.6.38-83.0 is Now Available

$
0
0
percona server for MySQL

Percona Server for MySQL 5.6.38-83.0 Percona announces the release of Percona Server for MySQL 5.6.38-83.0 on December 8, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.6.38, and including all the bug fixes in it, Percona Server for MySQL 5.6.38-83.0 is the current GA release in the Percona Server for MySQL 5.6 series. Percona Server for MySQL is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.38-83.0 milestone on Launchpad.

New Features:
Bugs Fixed:
  • Dynamic row format feature to support BLOB/VARCHAR in MEMORY tables requires all the key columns to come before any BLOB columns. This requirement however was not enforced, allowing creating MEMORY tables in unsupported column configurations, which then crashed or lose data in usage. Bug fixed #1731483.
  • If an I/O syscall returned an error during the server shutdown with Thread Pool enabled, a mutex could be left locked. Bug fixed #1702330 (Daniel Black).
  • After fixing bug #1668602, bug #1539504, and bug #1313901, CREATE/DROP TEMPORARY TABLE statements were forbidden incorrectly in transactional contexts, including function and trigger calls, even when they required no binary logging at all. Bug fixed #1711781.
  • Running ANALYZE TABLE while a long-running query is accessing the same table in parallel could lead to a situation where new queries on the same table are blocked in a Waiting for table flush state. Fixed by stopping ANALYZE TABLE flushing affected InnoDB and TokuDB tables from the table definition cache. Bug fixed #1704195 (upstream #87065).
  • MyRocks additions to the 5.6 mysqldump output have been removed.
  • CREATE TABLE ... LIKE ... did not use source row_format on target TokuDB table. Bug fixed #76.
  • TokuDB would encode already encoded database name for a directory name. Bug fixed #74.

Other bugs fixed: #1670902 (upstream #85352), #1670902 (upstream #85352), #1729241, #83, #80, and #75.

Release notes for Percona Server for MySQL 5.6.38-83.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

MySQL Enterprise Monitor 3.4.5 has been released

$
0
0

We are pleased to announce that MySQL Enterprise Monitor 3.4.5 is now available for download on the My Oracle Support (MOS) web site. This is a maintenance release that includes a few new features and fixes a number of bugs. You can find more information on the contents of this release in the change log.

You will find binaries for the new release on My Oracle Support. Choose the "Patches & Updates" tab, and then choose the "Product or Family (Advanced Search)" side tab in the "Patch Search" portlet.

Important: MySQL Enterprise Monitor (MEM) 4.0 offers many significant improvements over MEM 3.4 and we highly recommend that you consider upgrading. More information on MEM 4.0 is available here:

Please open a bug or a ticket on My Oracle Support to report problems, request features, or give us general feedback about how this release meets your needs.

If you are not a MySQL Enterprise customer and want to try the Monitor and Query Analyzer using our 30-day free customer trial, go to http://www.mysql.com/trials, or contact Sales at http://www.mysql.com/about/contact.

Thanks and Happy Monitoring!

- The MySQL Enterprise Tools Development Team

Useful URLs

This Week in Data with Colin Charles 18: Percona Live Call For Papers and a MongoDB 3.6 Overview

$
0
0
Colin Charles

Colin Charles Chief Evangelist for open source database community at PerconaJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

I highly recommend submitting to the CFP for Percona Live Santa Clara 2018 even though it only closes December 22 2017. By the 3rd week of December, i.e. before the CfP closes, it is very likely that we will announce some of the schedule. So get in early! Keep in mind the broad topics, there are some ideas here.

Also: we are looking for sponsors for Percona Live – you can email me for more information.

Releases

Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine its going to be be bigger and better for 2017, since it is now two days long!

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

Viewing all 18800 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>