Shinguz: MySQL replication with filtering is dangerous

January 12, 2017, 7:47 am

≫ Next: CVE-2016-6225: Percona Xtrabackup Encryption IV Not Being Set Properly

≪ Previous: MySQL Day – Sessions review #2

From time to time we see in customer engagements that MySQL Master/Slave replication is set-up doing schema or table level replication filtering. This can be done either on Master or on Slave. If filtering is done on the Master (by the binlog_{do|ignore}_db settings), the binary log becomes incomplete and cannot be used for a proper Point-in-Time-Recovery. Therefore FromDual recommends AGAINST this approach.

The replication filtering rules vary depending on the binary log format (ROW and STATEMENT) See also: How Servers Evaluate Replication Filtering Rules.

For reasons of data consistency between Master and Slave FromDual recommends to use only the binary log format ROW. This is also stated in the MySQL documentation: All changes can be replicated. This is the safest form of replication. Especially dangerous is binary log filtering with binary log format MIXED. This binary log format FromDual strongly discourages users to use.

The binary log format ROW affects only DML statements (UPDATE, INSERT, DELETE, etc.) but NOT DDL statements (CREATE, ALTER, DROP, etc.) and NOT DCL statements (CREATE, GRANT, REVOKE, DROP, etc.). So how are those statements replicated? They are replicated in STATEMENT binary log format even though binlog_format is set to ROW. This has the consequences that the binary log filtering rules of STATEMENT based replication and not the ones of ROW based replication apply when running one of those DDL or DCL statements.

This can easily cause problems. If you are lucky, they will cause the replication to break sooner or later, which you can detect and fix - but they may also cause inconsistencies between Master and Slave which may remain undetected for a long time.

Let us show what happens in 2 similar scenarios:

Scenario A: Filtering on `mysql` schema

On Slave we set the binary log filter as follows:

replicate_ignore_db = mysql

and verify it:

mysql> SHOW SLAVE STATUS\G
...
          Replicate_Ignore_DB: mysql
...

The intention of this filter setting is to not replicate user creations or modifications from Master to the Slave.

We verify on the Master, that binlog_format is set to the wanted value:

mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+

Now we do the following on the Master:

mysql> use mysql
mysql> CREATE USER 'inmysql'@'%';
mysql> use test
mysql> CREATE USER 'intest'@'%';

and verify the result on the Master:

mysql> SELECT user, host FROM mysql.user;
+-------------+-----------+
| user        | host      |
+-------------+-----------+
| inmysql     | %         |
| intest      | %         |
| mysql.sys   | localhost |
| root        | localhost |
+-------------+-----------+

and on the Slave:

mysql> SELECT user, host FROM mysql.user;
+-------------+-----------+
| user        | host      |
+-------------+-----------+
| intest      | %         |
| mysql.sys   | localhost |
| root        | localhost |
+-------------+-----------+

We see, that the user intest was replicated and the user inmysql was not. And we have clearly an unwanted data inconsistency between Master and Slave.

If we want to drop the inmysql user some time later on the Master:

mysql> use myapp;
mysql> DROP USER 'inmysql'@'%';

we get the following error message on the Slave and are wondering, why this user or the query appears on the Slave:

mysql> SHOW SLAVE STATUS\G
...
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation DROP USER failed for 'inmysql'@'%'' on query. Default database: 'test'. Query: 'DROP USER 'inmysql'@'%''
...

A similar problem happens when we connect to NO database on the Master as follows and change the users password:

shell> mysql -uroot
mysql> SELECT DATABASE();
+------------+
| database() |
+------------+
| NULL       |
+------------+
mysql> ALTER USER 'innone'@'%' IDENTIFIED BY 'secret';

This works perfectly on the Master. But what happens on the Slave:

mysql> SHOW SLAVE STATUS\G
...
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'innone'@'%'' on query. Default database: ''. Query: 'ALTER USER 'innone'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*14E65567ABDB5135D0CFD9A70B3032C179A49EE7''
...

The Slave wants to tell us in a complicated way, that the user innone does not exist on the Slave...

Scenario B: Filtering on `tmp` or similar schema

An other scenario we have seen recently is that the customer is filtering out tables with temporary data located in the tmp schema. Similar scenarios are cache, session or log tables. He did it as follows on the Master:

mysql> use tmp;
mysql> TRUNCATE TABLE tmp.test;

As he has learned in FromDual trainings he emptied the table with the TRUNCATE TABLE command instead of a DELETE FROM tmp.test command which is much less efficient than the TRUNCATE TABLE command. What he did not consider is, that the TRUNCATE TABLE command is a DDL command and not a DML command and thus the STATEMENT based replication filtering rules apply. His filtering rules on the Slave were as follows:

mysql> SHOW SLAVE STATUS\G
...
          Replicate_Ignore_DB: tmp
...

When we do the check on the Master we get an empty set as expected:

mysql> SELECT * FROM tmp.test;
Empty set (0.00 sec)

When we add new data on the Master:

mysql> INSERT INTO tmp.test VALUES (NULL, 'new data', CURRENT_TIMESTAMP());
mysql> SELECT * FROM tmp.test;
+----+-----------+---------------------+
| id | data      | ts                  |
+----+-----------+---------------------+
|  1 | new data  | 2017-01-11 18:00:11 |
+----+-----------+---------------------+

we get a different result set on the Slave:

mysql> SELECT * FROM tmp.test;
+----+-----------+---------------------+
| id | data      | ts                  |
+----+-----------+---------------------+
|  1 | old data  | 2017-01-11 17:58:55 |
+----+-----------+---------------------+

and in addition the replication stops working with the following error:

mysql> SHOW SLAVE STATUS\G
...
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table tmp.test; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log laptop4_qa57master_binlog.000042, end_log_pos 1572
...

See also our earlier bug report of a similar topic: Option "replicate_do_db" does not cause "create table" to replicate ('row' log)

Conclusion

Binary log filtering is extremely dangerous when you care about data consistency and thus FromDual recommends to avoid binary log filtering by all means. If you really have to do binary log filtering you should exactly know what you are doing, carefully test your set-up, check your application and your maintenance jobs and also review your future code changes regularly. Otherwise you risk data inconsistencies in your MySQL Master/Slave replication.

Taxonomy upgrade extras:

↧

CVE-2016-6225: Percona Xtrabackup Encryption IV Not Being Set Properly

January 12, 2017, 1:34 pm

≫ Next: MySQL Day – Sessions review #3

≪ Previous: Shinguz: MySQL replication with filtering is dangerous

If you are using Percona XtraBackup with

xbcrypt

to create encrypted backups, and are using versions older than 2.3.6 or 2.4.5, we advise that you upgrade Percona XtraBackup.

Note: this does not affect encryption of encrypted InnoDB tables.

CVE-2016-6225

Percona XtraBackup versions older than 2.3.6 or 2.4.5 suffered an issue of not properly setting the Initialization Vector (IV) for encryption. This could allow someone to carry out a Chosen-Plaintext Attack, which could recover decrypted content from the encrypted backup files without the need for a password.

Compatibility

Percona XtraBackup carries backward compatibility to allow for the decryption of older backup files. However, encrypted backup files produced by the versions that have the fix will not be compatible with older versions of Percona XtraBackup.

Applicability

Access to the encrypted files must already be present for exploitation to occur. So long as you adequately protect the encrypted files, we don’t expect this issue to adversely affect users.

Credits

Percona would like to thank and give credit to Ken Takara for discovering this issue and working it through to PoC exploitation.

More Information

CVE-2016-6225
We originally note the issue in https://bugs.launchpad.net/percona-xtrabackup/+bug/1185343 (CVE-2013-6394) and marked it as addressed in Percona XtraBackup 2.1.6 Release Notes. This did not completely fix the issue, however. We are confident it has now been fully addressed.
The 2016 bug report for this can be found here: https://bugs.launchpad.net/percona-xtrabackup/+bug/1643949

Release Notes

↧

MySQL Day – Sessions review #3

January 13, 2017, 1:34 am

≫ Next: Online schema change with gh-ost - throttling and changing configuration at runtime

≪ Previous: CVE-2016-6225: Percona Xtrabackup Encryption IV Not Being Set Properly

On February 3rd, just before Fosdem and the MySQL & Friends Devroom, MySQL’s Community Team is organizing the pre-Fosdem MySQL Day.

Today’s highlighted sessions are the one of Øystein Grøvlen:

MySQL 8.0: Common Table Expressions (CTEs)
Using Optimizer Hints to Improve MySQL Query Performance

Øystein is Senior Principal Software Engineer in the MySQL group at Oracle, where he works on the MySQL Query Optimizer.

Dr. Grøvlen has a PhD in Computer Science from the Norwegian University of Science and Technology. Before joining the MySQL team, he was a contributor on the Apache Derby project and Sun’s Architectural Lead on Java DB. Prior to that, he worked for 10 years on development of Clustra, a highly available DBMS. Øystein lives in Trondheim, Norway.

Øystein is a regular speakear at events like Oracle Open World, Percona Live, Fosdem, …

So, let’s check the content of the two sessions he will deliver during pre-Fosdem MySQL Day.

The first session is at 11.00AM and is about Common Table Expressions (sometimes referred to as WITH queries).

This is a new feature that will be available in MySQL 8.0. In their simplest form, CTEs are a way of creating a view/temporary table for usage in a single query. This can help improve the readability of SQL code. However, they have many more use cases. In particular, when using the RECURSIVE form of CTEs, it is possible to perform advanced tasks with few lines of code. This session covers CTEs as supported in MySQL 8.0, and will present several examples on how you can benefit from using CTEs.

Øystein’s second session is a 3.00PM and is about MySQL Optimizer.

Some times you will experience that the MySQL Optimizer picks a non-optimal execution plan for your
query. For example, this may happen when the optimizer assumes a uniform distribution of column values while your actual data is skewed. Or when the optimizer’s cost model is based on assumptions about performance of hardware components that are inaccurate for your system. Optimizer hints may in such cases be used to influence the optimizer to choose a more optimal plan. This session will cover the different types of hints available in MySQL, and through several practical examples, it will be shown how using hints may improve query performance. The session will also cover the new optimizer hints that have been introduced in MySQL 5.7 and 8.0.

Don’t forget to register for this main MySQL 8.0 event

↧

Online schema change with gh-ost - throttling and changing configuration at runtime

January 13, 2017, 2:08 am

≫ Next: Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0

≪ Previous: MySQL Day – Sessions review #3

In previous posts, we gave an overview of gh-ost and showed you how to test your schema changes before executing them. One important feature of all schema change tools is their ability to throttle themselves. Online schema change requires copying data from old table to a new one and, no matter what you do in addition to that, it is an expensive process which may impact database performance.

Throttling in gh-ost

Throttling is crucial to ensure that normal operations continue to perform in a smooth way. As we discussed in a previous blog post, gh-ost allows to stop all of its activity, which makes things so much less intrusive. Let’s see how it works and to what extent it is configurable.

First things first - what does gh-ost monitor? As we know, by default, gh-ost uses a master to execute writes, and a slave to track changes in binary logs. The master, obviously, will not give us any information about replication lag, but a slave will do - that’s where gh-ost gets its data on slave lag. Of course, one single slave is not necessarily representative of the whole replication chain. Therefore it is possible to define a list of slaves to check the replication lag via --throttle-control-replicas variable. All you need to do is to pass a comma-separated list of IP’s here and gh-ost will track lag on all of them. You can define what maximum lag is acceptable for you using --max-lag-millis. Once the threshold has been passed, gh-ost will stop its activity and allow slaves to catch up with the master.

The main problem is that, right now, gh-ost uses multiple methods of lag calculation, which make things not really clear. The documentation is also not clear enough to clarify how things work internally. Let’s take a look at how gh-ost operates right now.

As we mentioned, there are multiple methods used to calculate lag. First of all, gh-ost generates an internal heartbeat in its _ghc table.

mysql> SELECT * FROM sbtest1._sbtest1_ghc LIMIT 1\G
*************************** 1. row ***************************
         id: 1
last_update: 2016-12-27 13:36:37
       hint: heartbeat
      value: 2016-12-27T13:36:37.139851335Z
1 row in set (0.00 sec)

It is used to calculate lag on the slave/replica, on which gh-ost operates and reads binary logs from. Then, replicas are mentioned in --throttle-control-replicas. Those, by default, have their lag tracked using SHOW SLAVE STATUS and Seconds_Behind_Master. This data has the granularity of one second.

The problem is that sometimes, one second of lag is too much for the application to handle, therefore one of the very important features of gh-ost is to be able to detect sub-second lag. On the replica, where gh-ost operates, gh-ost’s heartbeat supports sub-second granularity using heartbeat-interval-millis variable. The remaining replicas, though, are not supported this way - there is an option to take advantage of an external heartbeat solution like, for example, pt-heartbeat, and calculate slave lag using --replication-lag-query.

Unfortunately, when we put it all together, it didn’t work as expected - sub-second lag was not calculated correctly by gh-ost. We decided to contact Shlomi Noah, who’s leading the gh-ost project, to get some more insight in how gh-ost operates regarding to sub-second lag detection. What you will read below is a result of this conversation, showing how it will be done in the future, in the “right” way.

gh-ost, at this moment, inserts heartbeat data in its _*_ghc table. This makes any external heartbeat generator redundant and, as a result, it makes --replication-lag-query deprecated and soon to be removed. Once it will be removed, gh-ost’s internal heartbeat will be used across the whole replication topology.

If you will want to check for lag with sub-second granularity, you will need to configure correctly --heartbeat-interval-millis and --max-lag-millis ensuring that heartbeat-interval-millis is set to lower value than max-lag-millis - that’s all. You can, for example, tell gh-ost to insert a heartbeat every 100 milliseconds (heartbeat-interval-millis) and then test if lag is less than, let’s say 500 milliseconds (max-lag-millis). Of course, lag will be checked on all replicas defined in --throttle-control-replicas. You can see updated documentation related to the lag checking process here: https://github.com/github/gh-ost/blob/3bf64d8280b7cd639c95f748ccff02e90a7f4345/doc/subsecond-lag.md

Please keep in mind that this is how gh-ost will operate when you use it in version v1.0.34 or later.

We need to mention, for a sake of completeness, one more setting - nice-ratio. It is used to define how aggressive gh-ost should be in copying the data. It basically tells ghost how much should it pause after each row copy operation. If you set it to 0 - no pause will be added. If you set it to 0.5, the whole process of copying rows will take 150% of original time. If you set it to 1, it will take twice as long (200%). It works but it is also pretty hard to adjust the ratio so the original workload is not affected. As long as you can use sub-second lag throttling, this is the way to go.

Runtime configuration changes in gh-ost

Another very useful feature of gh-ost is its ability to handle runtime configuration changes. When it starts, it listens on the unix socket, which you can choose through --serve-socket-file. By default it is created in /tmp dir and name is determined by gh-ost. It seems like it depends on the schema and table which gh-ost works upon. An example would be: /tmp/gh-ost.sbtest1.sbtest1.sock

Gh-ost can also work using TCP port but for that you need to pass --serve-tcp-port.

Knowing this, we can manipulate some of the settings. The best way to learn what we can change would be to ask gh-ost about it. When we send the ‘help’ string to the socket, we’ll get a list of available commands:

root@ip-172-30-4-235:~# echo help | nc -U /tmp/gh-ost.sbtest1.sbtest1.sock
available commands:
status                               # Print a detailed status message
sup                                  # Print a short status message
chunk-size=<newsize>                 # Set a new chunk-size
nice-ratio=<ratio>                   # Set a new nice-ratio, immediate sleep after each row-copy operation, float (examples: 0 is agrressive, 0.7 adds 70% runtime, 1.0 doubles runtime, 2.0 triples runtime, ...)
critical-load=<load>                 # Set a new set of max-load thresholds
max-lag-millis=<max-lag>             # Set a new replication lag threshold
replication-lag-query=<query>        # Set a new query that determines replication lag (no quotes)
max-load=<load>                      # Set a new set of max-load thresholds
throttle-query=<query>               # Set a new throttle-query (no quotes)
throttle-control-replicas=<replicas> # Set a new comma delimited list of throttle control replicas
throttle                             # Force throttling
no-throttle                          # End forced throttling (other throttling may still apply)
unpostpone                           # Bail out a cut-over postpone; proceed to cut-over
panic                                # panic and quit without cleanup
help                                 # This message

As you can see, there is a bunch of settings to change at runtime - we can change chunk size, we can change critical load settings (when defined thresholds will cross, causing gh-ost to start to throttle). You can also set settings related to throttling: nice-ratio, max-lag-millis, replication-lag-query, throttle-control-replicas. You can as well force throttling by sending the ‘throttle’ string to gh-ost or immediately stop the migration by sending ‘panic’.

Another setting which is worth mentioning is unpostpone. Gh-ost allows you to postpone the cutover process. As you know, gh-ost creates a temporary table using the new schema, and then fills it with data from the old table. Once all data has been copied, it performs a cut-over and replaces the old table with a new one. It may happen that you want to be there to monitor things, when gh-ost performs this step - in case something goes wrong. In that case, you can use --postpone-cut-over-flag-file to define a file which, if exists, will postpone the cut-over process. Then you can create that file and be sure that gh-ost won’t swap tables unless you let it by removing the file. Still, if you’d like to go ahead and force cut-over without a need to find and remove the postpone file, you can send ‘unpostpone’ string to gh-ost and it will immediately perform a cut-over.

We coming to the end of this post. Throttling is a critical part of any online schema change process (or any database-heavy process, for that matter) and it is important to understand how to do it right. Yet, even with throttling, some additional load is unavoidable That’s why, in our next blog post, we will try to assess the impact of running gh-ost on the system.

Tags:

MySQL

MariaDB

online schema change

gh-ost

↧

Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0

January 13, 2017, 6:16 am

≫ Next: The Impact of Swapping on MySQL Performance

≪ Previous: Online schema change with gh-ost - throttling and changing configuration at runtime

In MySQL 8.0 our plan is to drastically improve support for utf8. While utf8 support itself dates back to MySQL 4.1, there exist some limitations. The “sushi = beer” problem in the title refers to Bug #76553. Sushi and beer don’t even go well together, at least not to my taste:-) I will use this bug as an example to explain issues with utf8 collations in the past and our plans for utf8 support going forward.…

↧

The Impact of Swapping on MySQL Performance

January 13, 2017, 9:27 am

≫ Next: Funny replication breakage of Friday, January 13

≪ Previous: Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0

In this blog, I’ll look at the impact of swapping on MySQL performance.

It’s common sense that when you’re running MySQL (or really any other DBMS) you don’t want to see any I/O in your swap space. Scaling the cache size (using

innodb_buffer_pool_size

in MySQL’s case) is standard practice to make sure there is enough free memory so swapping isn’t needed.

But what if you make some mistake or miscalculation, and swapping happens? How much does it really impact performance? This is exactly what I set out to investigate.

My test system has the following:

32GB of physical memory
OS (and swap space) on a (pretty old) Intel 520 SSD device
Database stored on Intel 750 NVMe storage

To simulate a worst case scenario, I’m using Uniform Sysbench Workload:

sysbench --test=/usr/share/doc/sysbench/tests/db/select.lua   --report-interval=1 --oltp-table-size=700000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=64 --rand-type=uniform --db-driver=mysql --mysql-password=password --mysql-db=test_innodb  run

To better visualize the performance of the metrics that matter for this test, I have created the following custom graph in our Percona Monitoring and Management (PMM) tool. It shows performance disk IO and swapping activity on the same graph.

Here are the baseline results for

innodb_buffer_pool=24GB

. The results are a reasonable ballpark number for a system with 32GB of memory.

As you can see in the baseline scenario, there is almost no swapping, with around 600MB/sec read from the disk. This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.

Next, I changed the configuration to

innodb_buffer_pool_size=32GB

, which is the total amount of memory available. As memory is required for other purposes, it caused swapping activity:

We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.

Now let’s look at an even worse case. This time, we’ll set our configuration to

innodb_buffer_pool_size=48GB

(on a 32GB system).

Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms. As the graph shows, the performance becomes more variable, confirming the common assumption that intense swapping affects system stability.

Finally, let’s remember MySQL 5.7 has the Online Buffer Pool Resize feature, which was created to solve exactly this problem (among other reasons). It changes the buffer pool size if you accidentally set it too large. As we have tested

innodb_buffer_pool_size=24GB

, and demonstrated it worked well, let’s scale it back to that value:

mysql> set global innodb_buffer_pool_size=24*1024*1024*1024;
Query OK, 0 rows affected (0.00 sec)

Now the graph shows both good and bad news. The good news is that the feature works as intended, and after the resize completes we get close to the same results before our swapping experiment. The bad news is everything pretty much grinds to halt for 15 minutes or so while resizing occurs. There is almost no IO activity or intensive swapping while the buffer pool resize is in progress.

I also performed other sysbench runs for selects using Pareto random type rather than Uniform type, creating more realistic testing (skewed) data access patterns. I further performed update key benchmarks using both Uniform and Pareto access distribution.

You can see the results below:

As you can see, the results for selects are as expected. Accesses with Pareto distributions are better and are affected less – especially by minor swapping.

If you look at the update key results, though, you find that minor swapping causes performance to improve for Pareto distribution. The results at 48GB of memory are pretty much the same.

Before you say that that is impossible, let me provide an explanation: I limited

innodb_max_purge_lag

on this system to avoid unbound InnoDB history length growth. These workloads tend to be bound by InnoDB purge performance. It looks like swapping has impacted the user threads more than it did the purge threads, causing such an unusual performance profile. This is something that might not be repeatable between systems.

Summary

When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.

While you should continue to plan your capacity so that there is no constant swapping on the database system, these results show that a few MB/sec of swapping activity it is not going to have a catastrophic impact.

This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.

↧

Funny replication breakage of Friday, January 13

January 13, 2017, 10:29 am

≫ Next: PHP and MySQL Basics III -- Resulting Results

≪ Previous: The Impact of Swapping on MySQL Performance

A funny replication breakage kept me at the office longer than expected today (Friday 13 is not kind with me). So question of the day: can you guess what the below UPDATE statement does (or what is wrong with it)? > CREATE TABLE test_jfg ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, status ENUM('a','b') NOT NULL DEFAULT 'a', txt TEXT); Query OK, 0

↧

PHP and MySQL Basics III -- Resulting Results

January 13, 2017, 11:30 am

≫ Next: Oracle MySQL and the funny replication breakage of Friday, January 13

≪ Previous: Funny replication breakage of Friday, January 13

In the first two blogs entries on this series we set up a connection to MySQL and sent off a query. Now we need to get the data back from the database and into the application.

An Embarrassment of Riches

PHP has many options for what we want to do. But for the best place to start with was checking that rows were actually returned from a query. Below the results from a query are returned to a variable named $result. We can find out how many rows were returned from the server by examining $result->num_rows.

if (!$result = $mysqli->query($sql)) {

    // Again, do not do this on a public site, but we'll show you how
    // to get the error information
    echo "Error: Our query failed to execute and here is why: \n";
    echo "Query: " . $sql . "\n";
    echo "Errno: " . $mysqli->errno . "\n";
    echo "Error: " . $mysqli->error . "\n";
    exit;
}

// succeeded, but do we have a result?
if ($result->num_rows === 0) {
    // Oh, no rows! Sometimes that's expected and okay, sometimes
    // it is not. You decide.
    echo "No data returned.";
    exit;
}

This is a case where a programmer needs to know their data. In some cases you will not have a record or records returned because there is no data. Other times no data returned is a sign of big problems. So you have to have some education on what you expect back, and what you do not expect back.

Example

<?php
$mysqli = new mysqli("localhost", "root", "hidave", "world_x");

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

/* Select queries return a resultset */
$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);

                /* free result set */
                $result->close();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}

$mysqli->close();
?>

Sometime you just need the number of records, like number of outstanding customer orders. But in this case we are making sure we have some data to work with before proceedings.

So Now We Have Data

Now you have at least three choices -- rare, medium, or well done. Err, make that an associative array, an array or an object. Each have their uses and it is okay to have a favorite you use more.

$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);
$assoc = $result->fetch_assoc();
                $row = $result->fetch_row();
                $obj = $result->fetch_object();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}

So you make you choice of method and take the results. Here we use fetch_assoc(), fetch_row(), or fetch_object(). Depending on how you want to refer to the data, you use the one that fits the situation. Of course they are similar in use.

//associated array keys = column name, data = data from DB
printf("Sample assoc array %s -> %s\n", $assoc['Name'], $assoc['CountryCode']);

// simple row
printf("Sample row array %s -> %s\n", $row[0], $row[1]);

//object
printf("Sample object %s -> %s\n", $obj->Name, $obj->CountryCode);

Yes, you need to know all three as you will be looking at old code or someone else code that does not use your favorite. And sometimes you may need an object rather than a simple row.

Full Listing

<?php
$mysqli = new mysqli("localhost", "root", "hidave", "world_x");

/* check connection */
if ($mysqli->connect_errno) {
    printf("Connect failed: %s\n", $mysqli->connect_error);
    exit();
}

/* Select queries return a resultset */
$query="SELECT Name, CountryCode, District FROM city LIMIT 10";

if ($result = $mysqli->query($query)) {

        if ($result->num_rows){
                printf("Select returned %d rows.\n", $result->num_rows);
                $assoc = $result->fetch_assoc();
                $row = $result->fetch_row();
                $obj = $result->fetch_object();
        } else {
                echo "No data returned";
        }
} else {   // if ($result)
   printf("Query failed: %s", $mysqli_error);
}
//associated array keys = column name, data = data from DB
printf("Sample assoc array %s -> %s\n", $assoc['Name'], $assoc['CountryCode']);

// simple row
printf("Sample row array %s -> %s\n", $row[0], $row[1]);

//object
printf("Sample object %s -> %s\n", $obj->Name, $obj->CountryCode);

$result->close();
$mysqli->close();
?>

↧

Oracle MySQL and the funny replication breakage of Friday, January 13

January 13, 2017, 12:56 pm

≫ Next: Solving MySQL Replication Lag with LOGICAL_CLOCK and Calibrated Delay

≪ Previous: PHP and MySQL Basics III -- Resulting Results

In my previous post, I talked about a funny replication breakage that I experienced with MariaDB. So what about different versions of MySQL... > SELECT version(); +------------+ | version() | +------------+ | 5.6.35-log | +------------+ 1 row in set (0.00 sec) > SELECT * FROM test_jfg; +----+--------+-------------+ | id |

↧

Solving MySQL Replication Lag with LOGICAL_CLOCK and Calibrated Delay

January 13, 2017, 3:02 pm

≫ Next: Setup ProxySQL as High Available (and not a SPOF)

≪ Previous: Oracle MySQL and the funny replication breakage of Friday, January 13

Last week VividCortex's Preetam Jinka published a post on his personal blog examining how our engineering team had overcome a problem with MySQL replication by using a new parallelization policy introduced in MySQL 5.7: LOGICAL_CLOCK.

Image Credit

The solution we developed—which achieves faster replication via group commit and a carefully calibrated delay—can offer huge replication improvements, but its implementation isn't immediately obvious or intuitive. We thought it worthwhile to provide a fuller description of how we arrived at the solution Preetam outlined.

Bad Replication

Things started when Preetam noted that replication delay had been getting worse for some of our shards, in fairly periodic waves. These were the kinds of results we were seeing:

Replication Delay.png

As Preetam wrote in his post, a solution to these periodic delays was eluding us. "It wasn't about resources. The replicas have plenty of CPU and I/O available. We’re also using multithreaded replication (a.k.a. MTR) but most of the replication threads were idle."

There were a couple of attempted fixes that we attempted right away:

In order to speed up queries overall, double the instance and buffer pool sizes. It didn't help.
Increasing slave-parallel-workers didn't make a difference, and most seemed idle anyway.

Watch a Recording of Preetam Jinka's Webinar
"How To Be A Performance-Driven Engineer"

First Look at `LOGICAL_CLOCK`

Interestingly, even at this early stage, we were aware of MySQL 5.7's LOGICAL_CLOCK parallelization policy. But our initial experiments with it offered nothing of value, even though the MySQL High Availability team had written that it offers optimal concurrency. The MySQL 5.7 reference manual defines LOGICAL_CLOCK like this:

LOGICAL_CLOCK: Transactions that are part of the same binary log group commit on a master are applied in parallel on a slave. There are no cross-database constraints, and data does not need to be partitioned into multiple databases.

Unfortunately, our first foray with --slave-parallel-type=LOGICAL_CLOCK actually resulted in slower replication than --slave-parallel-type=DATABASE. It didn't appear to offer an answer. [Spoiler alert: we had yet to develop a key component of how to optimize the new policy, involving grouping more transactions per binary log commit.]

Serious Delays

We weren't sure the best way to proceed. We noted that one major cause of bad replication can be long-running queries, because they parallelize on the master but serialize on replicas. A workaround can be to break big queries into many small ones. Or, to really fix the problem, get rid of MySQL replication altogether! Those weren't immediately viable or practical solutions, though.

At this point, a few of our shards were suffering some serious delays. The worst was behind by at least 16 hours. Here's a 30 day snapshot:

delayed shard.jpeg

We looked back at the master in order to understand the replica more fully. Looking at the write load on the master can be a good way to analyze what the replica's write load is doing, which might be making lag.

Using VividCortex, we examined the Top Queries related to the lagging shards and noted that there had been a great deal of metrics, sketching, and—probably most importantly—downsampling surfaced over the past month.

Replication Delay Screenshot.png

The downsampling in particular had increased dramatically, though it still wasn't the top contributor to these top queries. However, that kind of change is significant; delays like the ones we were seeing can be the result of redownsampling, causing duplicate work.

Download VividCortex's Free eBook
"Sampling a Stream of Events With a Probabilistic Sketch"

In the screenshot below, note where the cursor is hovering over the sparkline. With VividCortex, we're able to look at the total of execution time in the pictured period—which, in this case, was ten seconds. Hovering over a point shows an instantaneous value as a rate; here, that essentially means ten seconds of execution per second. So, the concurrency of query #1 is 10x. If this isn't possible to parallelize at least 10x on the replica, the replica will not keep up with the master.

Concurrency rate.jpeg

The Return of `LOGICAL_CLOCK`

At this point, Preetam returned to the original idea of experimenting with LOGICAL_CLOCK. But this time, he also took note of binlog_group_commit_sync_delay, introduced in MySQL 5.7.2+.

Controls how many microseconds the binary log commit waits before synchronizing the binary log file to disk… Setting binlog-group-commit-sync-delay to a microsecond delay enables more transactions to be synchronized together to disk at once, reducing the overall time to commit a group of transactions because the larger groups require fewer time units per group (MySQL Reference Manual).

We applied a commit delay to our straggling shards… and voila.

Before:

Before Delays.jpeg

After:

After Delays.jpeg

After Delays 2.jpeg

It took us a couple tries to find the exact setting for the binlog_group_commit_sync_delay. We started with 50 ms, but that was too much. Overall query latency went up (by ~50 ms) and caused the consumers to fall behind.

On the other hand, 500 µs was too low, and it caused the replicas to fall behind again.

At first, 3 ms appeared to be the sweet spot, and resulted in what looked like overall replication improvement. Success! However, over the next few days, the replicas eventually started falling behind again, and we responded by upping the delay; it's at 10 ms now.

This last delay increase actually confirmed some feedback that Jean-François Gagné commented on Preetam's blog post shortly after it was published. Jean-François wrote

In my experience, 3 ms for delaying commit on the master is not a lot. On some of my systems, I am delaying commit by up to 300 ms (0.3 second), but this system does not have a very high commit rate (transactions are "big"). If 3 ms is working well for you, my guess is that you have a very high commit rate and very short transactions.

He also kindly shared these links for additional reading:

[1]: http://www.slideshare.net/Jean...

[2]: https://blog.booking.com/evalu...

Overall, this was an interesting issue to address and not at all as obvious as one might think. We hope the solution we arrived at is useful for you and your organizations, especially if these MySQL settings were previously unfamiliar. Is this something you've used before? If not, and this post inspires you to experiment with them now, how well do they work? Do different delay settings work even better for you? Let us know!

↧

Setup ProxySQL as High Available (and not a SPOF)

January 14, 2017, 3:00 pm

≫ Next: MySQL group replication: installation with Docker

≪ Previous: Solving MySQL Replication Lag with LOGICAL_CLOCK and Calibrated Delay

During the last few months we had a lot of opportunities to present and discuss about a very powerful tool that will become more and more used in the architectures supporting MySQL, ProxySQL.

ProxySQL is becoming every day more flexible, solid, performant and used (http://www.proxysql.com/ and recent http://www.proxysql.com/compare).

This is it, the tool is a winner in comparing it with similar ones, and we all need to have a clear(er) idea on how integrate it in our architectures in order to achieve the best results.

The first to keep in mind is that ProxySQL is not natively supporting any high availability solution, in short we can setup a cluster of MySQL(s) and achieve 4 or even 5 nines of HA, but if we include ProxySQL, as it is, and as single block, our HA will include a single point of failure (SPOF) that will drag us down in case of crash.

To solve this, the most common solution so far had be to setup ProxySQL as part of a tile architecture, where Application/ProxySQL where deploy together.

tileProxy

This is a good solution for some cases, and for sure it reduce the network hops, but what it may be less than practical when our architecture has a very high number of tiles.
Say 100 or 400 application servers, not so unusual nowadays.
In that case managing the ProxySQL will be challenging, but most problematic it will be the fact that ProxySQL must perform several checks on the destination servers (MySQL), and if we have 400 instance of ProxySQL we will end up keeping our databases busy just because the checks.

In short ... is not a smart move.

Another possible approach, used so far was to have two layers of ProxySQL, one close to the application, another in the middle to finally connect to the database.

I personally don't like this approach for many reasons, but the most relevants are that this approach create additional complexity in the management of the platform, and it adds network hops.

ProxyCascade

So what can be done?

I like to have things simple, I love the KISS principle, I love to have things simple and because I am lazy I love to reuse the wheel instead re-invent things that someone else had already invent.

Last thing I like to have my customers not depending from me or any other colleague, once I am done, and gone, they must be able to manage their things, understand their things, fix their things by themselves.

Anyhow as said I like simple things. So my point here is the following:

excluding the cases where a tile (application/ProxySQL) make sense;
or when in the cloud and tools like ELB (Elastic load balancer) exist;
or on architecture already including a balancer.

What I can use for the remaining cases?

The answer comes with existing solutions and combining existing blocks, KeepAlived + ProxySQl + MySQL.

keepalived_logo

For KeepAlived explanation visit http://www.keepalived.org/.

Short description
"Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. Loadbalancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 loadbalancing. Keepalived implements a set of checkers to dynamically and adaptively maintain and manage loadbalanced server pool according their health. On the other hand high-availability is achieved by VRRP protocol. VRRP is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. Keepalived frameworks can be used independently or all together to provide resilient infrastructures."

Bingo! this is exactly what we need for our ProxySQL setup.

Below I will show how to setup:

Simple solution base on a single VIP
More complex solution using multiple VIPs
Even more complex solution using virtual VIPs and virtual servers.

Just remind that what we want to achieve is to prevent ProxySQL to become a SPOF, that's it.

While achieving that we need to reduce as much as possible the network hops and keep the solution SIMPLE.

Another important concept to keep in mind is that ProxySQL (re)start take place in less then a second.

This means that if it crash and it can be restarted by the angel process, having it doing so and recovery the service is much more efficient than to have any kind of failover mechanism to take place.

As such whenever you plan your solution keep in mind the ~1 second time of ProxySQL restart as base line.

Ready?

Let's go.

Setup

Choose 3 machines that will host the combination of Keepalive and ProxySQL.

In the following example I will use 3 machines for ProxySQL and Keepalived and 3 hosting PXC, but you can have the Keepalived+ProxySQL whenever you like also on the same PXC box.

For the following examples we will have:

PXC
node1 192.168.0.5 galera1h1n5
node2 192.168.0.21 galera2h2n21
node3 192.168.0.231 galera1h3n31
 
ProxySQL-Keepalived
test1 192.168.0.11
test2 192.168.0.12
test3 192.168.0.235
 
VIP 192.168.0.88 /89/90

To check I will use this table, please create it in your MySQL server:

DROP TABLE  test.`testtable2`;
 CREATE TABLE test.`testtable2` (
  `autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(100) COLLATE utf8_bin NOT NULL,
  `b` varchar(100) COLLATE utf8_bin NOT NULL,
  `host` varchar(100) COLLATE utf8_bin NOT NULL,
  `userhost` varchar(100) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`autoInc`)
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;

And this bash TEST command to use later

while [ 1 ];do export mydate=$(date +'%Y-%m-%d %H:%M:%S.%6N');

mysql --defaults-extra-file=./my.cnf -h 192.168.0.88 -P 3311  --skip-column-names

-b -e "BEGIN;set @userHost='a';select concat(user,'_', host) into @userHost from information_schema.processlist  where user = 'load_RW' limit 1;insert into test.testtable2 values(NULL,'$mydate',SYSDATE(6),@@hostname,@userHost);commit;select * from test.testtable2 order by 1 DESC limit 1" ;

sleep 1;done

Install ProxySQL (refer to https://github.com/sysown/proxysql/wiki#installation)
Install Keepalived (yum install keepalived; apt-get install keepalived)
Setup ProxySQL users and servers

Once you have your ProxySQL up (run the same on all ProxySQL nodes, it is much simpler), connect to the Admin interface and:

DELETE FROM mysql_replication_hostgroups WHERE writer_hostgroup=500 ;
DELETE FROM mysql_servers WHERE hostgroup_id IN (500,501);
 
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.5',500,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.5',501,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.21',500,3306,1000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.21',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.231',500,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.0.231',501,3306,1000000000);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
 
DELETE FROM mysql_users WHERE username='load_RW';
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('load_RW','test',1,500,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
 
DELETE FROM mysql_query_rules WHERE rule_id IN (200,201);
INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(200,'load_RW',501,1,3,'^SELECT.*FOR UPDATE',1);
INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(201,'load_RW',501,1,3,'^SELECT ',1); 
 
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

Create a my.cnf file in your default dir with

[mysql]
user=load_RW
password=test

Simple Setup using a single VIP 3 ProxySQL 3 Galera nodes

proxy_keep_single

First setup the keepalive configuration file (/etc/keepalived/keepalived.conf):

global_defs {
  # Keepalived process identifier
  lvs_id proxy_HA
}
# Script used to check if Proxy is running
vrrp_script check_proxy {
  script "killall -0 proxysql"
  interval 2
  weight 2
}
# Virtual interface
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER 
  interface em1
  virtual_router_id 51
  priority <calculate on the WEIGHT for each node>
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.88 dev em1
  }
  track_script {
    check_proxy
  }
}

Given the above and given I want to have test1 as main priority will be set as:

test1 = 101
test2 = 100
test3 = 99

Modify the config in each node following the above values and (re)start keepalived.

If all is set correctly in the system log of the TEST1 machine you will see:

Jan 10 17:56:56 mysqlt1 systemd: Started LVS and VRRP High Availability Monitor.
Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Configuration is using : 6436 Bytes
Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Using LinkWatch kernel netlink reflector...
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Configuration is using : 63090 Bytes
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Using LinkWatch kernel netlink reflector...
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP_Script(check_proxy) succeeded
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received lower prio advert, forcing new election
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received higher prio advert
Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering BACKUP STATE
Jan 10 17:56:58 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) forcing a new MASTER election
...
Jan 10 17:57:00 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering MASTER STATE <-- MASTER
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) setting protocol VIPs.
Jan 10 17:57:01 mysqlt1 Keepalived_healthcheckers[6183]: Netlink reflector reports IP 192.168.0.88 added
Jan 10 17:57:01 mysqlt1 avahi-daemon[937]: Registering new address record for 192.168.0.88 on em1.IPv4.
Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Sending gratuitous ARPs on em1 for 192.168.0.88

While in the other two:

Jan 10 17:56:59 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Entering BACKUP STATE <---

Which means node is there as ... :D Backup.

Now is time to test our connection to our ProxySQL pool.

From an application node or just from your laptop.

Open 3 terminals and in each one:

 watch -n 1 'mysql -h <IP OF THE REAL PROXY (test1|test2|test3)> -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select srv_host,command,avg(time_ms), count(ThreadID) from stats_mysql_processlist group by srv_host,command;" -e "select * from stats_mysql_commands_counters where  Total_Time_us > 0;"'

You will see that unless you are already sending queries to proxies, you have the Proxies just doing nothing.
Time to start the test bash as I indicate above.
If everything is working correctly you will see the bash command reporting this:

+----+----------------------------+----------------------------+-------------+----------------------------+
| 49 | 2017-01-10 18:12:07.739152 | 2017-01-10 18:12:07.733282 | galera1h1n5 | load_RW_192.168.0.11:33273 |
+----+----------------------------+----------------------------+-------------+----------------------------+
  ID    execution time in the bash   exec time inside mysql     node hostname   user and where the connection is coming from

The other 3 running bash commands will show that ONLY the ProxySQL in TEST1 is currently getting/serving requests, because is the one with the VIP.

Like:

+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.0.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 629        |
| 500       | 192.168.0.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 510        |
| 500       | 192.168.0.5   | 3306     | ONLINE | 0        | 0        | 3      | 0       | 18      | 882             | 303             | 502        |
| 501       | 192.168.0.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 629        |
| 501       | 192.168.0.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 510        |
| 501       | 192.168.0.5   | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 502        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+
| Command | Total_Time_us | Total_cnt | cnt_100us | cnt_500us | cnt_1ms | cnt_5ms | cnt_10ms | cnt_50ms | cnt_100ms | cnt_500ms | cnt_1s | cnt_5s | cnt_10s | cnt_INFs |
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+
| BEGIN   | 9051          | 3         | 0         | 0         | 0       | 3       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| COMMIT  | 47853         | 3         | 0         | 0         | 0       | 0       | 0        | 3        | 0         | 0         | 0      | 0      | 0       | 0        |
| INSERT  | 3032          | 3         | 0         | 0         | 1       | 2       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| SELECT  | 8216          | 9         | 3         | 0         | 3       | 3       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
| SET     | 2154          | 3         | 0         | 0         | 3       | 0       | 0        | 0        | 0         | 0         | 0      | 0      | 0       | 0        |
+---------+---------------+-----------+-----------+-----------+---------+---------+----------+----------+-----------+-----------+--------+--------+---------+----------+

So nothing special right, all as expected.

Time to see if the failover-failback works along the chain.

Let us kill the ProxySQL on TEST1 while the test bash command is running.

killall -9 proxysql

Here is what you will get:

+----+----------------------------+----------------------------+-------------+----------------------------+
| 91 | 2017-01-10 18:19:06.188233 | 2017-01-10 18:19:06.183327 | galera1h1n5 | load_RW_192.168.0.11:33964 |
+----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+----+----------------------------+----------------------------+-------------+----------------------------+
| 94 | 2017-01-10 18:19:08.250093 | 2017-01-10 18:19:11.250927 | galera1h1n5 | load_RW_192.168.0.12:39635 | <-- note 
+----+----------------------------+----------------------------+-------------+----------------------------+

the source had change but not the PXC node.

If you check the system log for TEST1:

Jan 10 18:19:06 mysqlt1 Keepalived_vrrp[6184]: VRRP_Script(check_proxy) failed
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received higher prio advert
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering BACKUP STATE
Jan 10 18:19:07 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) removing protocol VIPs.
Jan 10 18:19:07 mysqlt1 Keepalived_healthcheckers[6183]: Netlink reflector reports IP 192.168.0.88 removed

While on TEST2

Jan 10 18:19:08 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Transition to MASTER STATE
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Entering MASTER STATE
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) setting protocol VIPs.
Jan 10 18:19:09 mysqlt2 Keepalived_healthcheckers[13106]: Netlink reflector reports IP 192.168.0.88 added
Jan 10 18:19:09 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Sending gratuitous ARPs on em1 for 192.168.0.88

Simple ... and elegant. No need to re-invent the wheel and works smooth.

The total time for the recovery given the ProxySQL crash had be of 5.06 seconds,

considering the wider window ( last application start, last recovery in PXC 2017-01-10 18:19:06.188233|2017-01-10 18:19:11.250927)

As such the worse scenario, keeping in mind we run the check for the ProxySQL every 2 seconds (real recover max window 5-2=3 sec).

OK what about fail-back?

Let us restart the proxysql service:

/etc/init.d/proxysql start (or systemctl)

Here the output:

+-----+----------------------------+----------------------------+-------------+----------------------------+
| 403 | 2017-01-10 18:29:34.550304 | 2017-01-10 18:29:34.545970 | galera1h1n5 | load_RW_192.168.0.12:40330 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 406 | 2017-01-10 18:29:35.597984 | 2017-01-10 18:29:38.599496 | galera1h1n5 | load_RW_192.168.0.11:34640 |
+-----+----------------------------+----------------------------+-------------+----------------------------+

Worse recovery time = 4.04 seconds of which 2 of delay because the check interval.

Of course the test is running every second and is running one single operation, as such the impact is minimal (no error in fail-back), and recovery longer.

But I think I have made clear the concept here.

Let see another thing... is the failover working as expected? Test1 -> 2 -> 3 ??

Let us kill 1 - 2 and see:

Kill Test1 :
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 448 | 2017-01-10 18:35:43.092878 | 2017-01-10 18:35:43.086484 | galera1h1n5 | load_RW_192.168.0.11:35240 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 451 | 2017-01-10 18:35:47.188307 | 2017-01-10 18:35:50.191465 | galera1h1n5 | load_RW_192.168.0.12:40935 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
...
Kill Test2
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 463 | 2017-01-10 18:35:54.379280 | 2017-01-10 18:35:54.373331 | galera1h1n5 | load_RW_192.168.0.12:40948 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 466 | 2017-01-10 18:36:08.603754 | 2017-01-10 18:36:09.602075 | galera1h1n5 | load_RW_192.168.0.235:33268 |
+-----+----------------------------+----------------------------+-------------+-----------------------------+

This image is where you should be at the end:

proxy_keep_single_failover

In this case given I have done one kill immediately after the other, Keepalived had take a bit more in failing over, but still it did correctly and following the planned chain.

Fail-back as smooth as usual:

+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 502 | 2017-01-10 18:39:18.749687 | 2017-01-10 18:39:18.749688 | galera1h1n5 | load_RW_192.168.0.235:33738 |
+-----+----------------------------+----------------------------+-------------+-----------------------------+
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 505 | 2017-01-10 18:39:19.794888 | 2017-01-10 18:39:22.800800 | galera1h1n5 | load_RW_192.168.0.11:35476 |
+-----+----------------------------+----------------------------+-------------+----------------------------+

Let us see now another case.

The case above is nice and simple, but as a cavet.

I can access only one ProxySQL a time, which may be good or not.

In any case it may be nice to have the possibility to choose, and with Keepalived you can.

We can actually set an X number of VIP and associate them to each test box.

The result will be that each server hosting ProxySQL will also host a VIP, and will be eventually able to fail-over to any of the other two servers.

proxy_keep_multiple

Failing-over/Back will be fully managed by Keepalived, checking as we did before if ProxySQL is running.
Example of configuration for one node can be the one below:

global_defs {
  # Keepalived process identifier
  lvs_id proxy_HA
}
# Script used to check if Proxy is running
vrrp_script check_proxy {
  script "killall -0 proxysql"
  interval 2
  weight 3
}
 
# Virtual interface 1
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER
  interface em1
  virtual_router_id 51
  priority 102
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.88 dev em1
  }
  track_script {
    check_proxy
  }
}
 
# Virtual interface 2
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_02 {
  state MASTER
  interface em1
  virtual_router_id 52
  priority 100
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.89 dev em1
  }
  track_script {
    check_proxy
  }
}
 
# Virtual interface 3
# The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_03 {
  state MASTER
  interface em1
  virtual_router_id 53
  priority 99
 
  # The virtual ip address shared between the two loadbalancers
  virtual_ipaddress {
    192.168.0.90 dev em1
  }
  track_script {
    check_proxy
  }
}

The tricky part in this case is to play with the PRIORITY for each VIP and each server such that you will NOT assign the same ip twice.

The whole set of configs can be found here

Performing the check with the test bash as above we have:

Test 1 crash
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 422 | 2017-01-11 18:30:14.411668 | 2017-01-11 18:30:14.344009 | galera1h1n5 | load_RW_192.168.0.11:55962 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 426 | 2017-01-11 18:30:18.531279 | 2017-01-11 18:30:21.473536 | galera1h1n5 | load_RW_192.168.0.12:49728 | <-- new server
+-----+----------------------------+----------------------------+-------------+----------------------------+
....
Test 2 crash
+-----+----------------------------+----------------------------+-------------+----------------------------+
| 450 | 2017-01-11 18:30:27.885213 | 2017-01-11 18:30:27.819432 | galera1h1n5 | load_RW_192.168.0.12:49745 |
+-----+----------------------------+----------------------------+-------------+----------------------------+
ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.88' (111)
+-----+----------------------------+----------------------------+-------------+-----------------------------+
| 454 | 2017-01-11 18:30:30.971708 | 2017-01-11 18:30:37.916263 | galera1h1n5 | load_RW_192.168.0.235:33336 | <-- new server
+-----+----------------------------+----------------------------+-------------+-----------------------------+

Final state of IPs on Test3:

enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:c2:16:3f brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.235/24 brd 192.168.0.255 scope global enp0s8   <-- Real IP
       valid_lft forever preferred_lft forever
    inet 192.168.0.90/32 scope global enp0s8    <--- VIP 3
       valid_lft forever preferred_lft forever
    inet 192.168.0.89/32 scope global enp0s8    <--- VIP 2
       valid_lft forever preferred_lft forever
    inet 192.168.0.88/32 scope global enp0s8    <--- VIP 1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fec2:163f/64 scope link 
       valid_lft forever preferred_lft forever

And this is the image:

proxy_keep_multiple_full_failover

Recovery times:

  test 1 crash = 7.06 sec (worse case scenario)
  test 2 crash = 10.03 sec (worse case scenario)

Conclusions

In this example I had just use a test that checks the process, but a check can be anything reporting 0|1, the limit is define only from what you need.

The times for the failover can be significant shorter, reducing the check time and considering only the time taken to move the VIP, I had prefer to show the worse case scenario considering an application with a second interval, but that is a pessimistic view of what normally happens with real traffic.

I was looking for a simple, simple simple way to add HA to ProxySQL, something that can be easily integrate with automation and that is actually also well established and maintained.

In my opinion using Keepalived is a good solution because it match all the above expectations.

Implementing a set of ProxySQL and have Keepalived manage the failover between them is pretty easy, but you can expand the usage (and the complexity) if you need, counting on tools that are already part of the Linux stack, no need to re-invent the wheel with crazy mechanism.

If you want to have fun doing crazy things... at least start from something that helps you to go beyond the basiscs.

For instance I was also playing a bit with keepalived and virtual server, creating set of redundant Proxysql with load balancers and ... .. but this is another story (blog).

Great MySQL & ProxySQL to all!

↧

MySQL group replication: installation with Docker

January 15, 2017, 2:34 pm

≫ Next: MySQL Group Replication, the perfect HA database backend for web hosting

≪ Previous: Setup ProxySQL as High Available (and not a SPOF)

Overview

MySQL Group Replication was released as GA with MySQL 5.7.17. It is essentially a plugin that, when enabled, allows users to set replication with this new way.

There has been some confusion about the stability and usability of this release. Until recently, MySQL Group Replication (MGR) was only available in the Labs, which traditionally denotes a preview or an use-at-your-own-risk feature. Several months ago we saw the release of Group Replication as a Docker image, which allowed users to deploy a peer-to-peer cluster (every node is a master.) However, about one month after such release, word came from Oracle discouraging this setup, and inviting users to use Group Replicator in Single Primary mode which is functionally equivalent to traditional replication, with just some synchronous component more. There hasn't been an update of MGR for Docker since.

BTW, some more confusion came from the use of "synchronous replication" to refer to Group Replication operations. In reality, what in many presentations was called synchronous replication is only a synchronous transfer of binary logs data. The replication itself, i.e. the operation that makes a node able to retrieve the data inserted in the master, is completed asynchronously. Therefore, if you looked at MGR as a way of using multiple masters without conflicts, this is not the solution.

What we have is a way of replicating from a node that is the Primary in the group, with some features designed to facilitate high availability solutions. And all eyes are on the next product, which is based on MGR, named MySQL InnoDB Cluster which is MGR + an hormone pumped MySQL Shell (released with the same version number 1.0.5 in two different packages,) and MySQL-Router.

MGR has several limitations, mostly related to multi-primary mode.

Another thing that users should know is that the performance of MGR is inferior to that of asynchronous replication, even in Single-Primary mode. As an example, loading the test employees database takes 92 seconds in MGR, against 49 seconds in asynchronous replication (same O.S., same MySQL version, same server setup.)

Installing MySQL Group Replication

One of the biggest issue with MGR has been the quality of its documentation, which for a while was just lack of documentation altogether. What we have now has a set of instructions that refers to installing group replication in three nodes on the same host. You know, sandboxes, although without the benefit of using a tool to simplify operations. It's just three servers on the same host, and you drive with stick shift.

What we'll see in this post is how to set group replication using three servers in Docker. The advantage of using this approach is that the servers look and feel like real ones. Since the instructions assume that you are only playing with sandboxes (an odd assumption for a GA product) we lack the instructions for a real world setup. The closest thing to a useful manual is the tutorial given by Frédéric Descamps and Kenny Gryp at PerconaLive Amsterdam in October. The instructions, however, are muddled up by the fact that they were using the still unreliable InnoDB Cluster instead of a bare bones Group Replicator. What follows is my own expansion of the sandboxed rules as applied to distinct servers.

The environment:

I am using Docker 1.12.6 on Linux, and the image for mysql/mysql-server:5.7.17. I deploy three containers, with a customized my.cnf containing the bare minimum options to run Group Replication. Here's the template for the configuration files:

$ cat my-template.cnf
[mysqld]
user=mysql
server_id=_SERVER_ID_
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=mysql-bin
relay-log=relay
binlog_format=ROW
log-error=mysqld.err

transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "172.19.0._IP_END_:6606"
loose-group_replication_group_seeds= "172.19.0.2:6606,172.19.0.3:6606,172.19.0.4:6606"
loose-group_replication_ip_whitelist="172.19.0.2,172.19.0.3,172.19.0.4,127.0.0.1"
loose-group_replication_bootstrap_group= off

Here I take a shortcut. Recent versions of Docker assign a predictable IP address to new containers. To make sure I get the right IPs, I use a private network to deploy the containers. In a perfect world, I should use the container names for this purpose, but the manual lacks the instructions to set up the cluster progressively. For now, this method requires full knowledge about the IPs of the nodes, and I play along with what I have.

This is the deployment script:

#!/bin/bash
exists_net=$(docker network ls | grep -w group1 )
if [ -z "$exists_net" ]
then
    docker network create group1
fi
docker network ls

for node in 1 2 3
do
    export SERVERID=$node
    export IPEND=$(($SERVERID+1))
    perl -pe 's/_SERVER_ID_/$ENV{SERVERID}/;s/_IP_END_/$ENV{IPEND}/' my-template.cnf > my${node}.cnf
    datadir=ddnode${node}
    if [ ! -d $datadir ]
    then
        mkdir $datadir
    fi
    unset SERVERID
    docker run -d --name=node$node --net=group1 --hostname=node$node \
        -v $PWD/my${node}.cnf:/etc/my.cnf \
        -v $PWD/data:/data \
        -v $PWD/$datadir:/var/lib/mysql \
        -e MYSQL_ROOT_PASSWORD=secret \
        mysql/mysql-server:5.7.17

    ip=$(docker inspect --format '{{ .NetworkSettings.Networks.group1.IPAddress}}' node${node})
    echo "${node} $ip"
done

This script deploys three nodes, called node1, node2, and node3. For each one, the template is modified to use a different server ID. They use an external data directory created on the current directory (see Customizing MYSQL in Docker for more details on this technique.) Moreover, each node can access the folder /data, which contains this set of SQL commands:

reset master;
SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'rpl_pass';
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery';
INSTALL PLUGIN group_replication SONAME 'group_replication.so';

Operations

After deploying the containers using the above script, I wait a few seconds to give time to the servers to be ready. I can peek at the error logs, which are in the directories ddnode1, ddnode2, and ddnode3, as defined in the installation command. Then I run the SQL code:

$ for N in 1 2 3; do docker exec -ti node$N bash -c 'mysql -psecret < /data/user.sql' ; done

At this stage, the plugin is installed in all three nodes. I can start the cluster:

$ docker exec -ti node1 mysql -psecret
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.14 sec)

mysql>SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | ecba1582-db68-11e6-a492-0242ac130002 | node1       |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
1 row in set (0.00 sec)

The above operations have started the replication with the bootstrap, an operation that must be executed only once, and that defines the primary node.

After setting the replication, I can enter some data, and then see what happens in the other nodes:

mysql> create schema test;
Query OK, 1 row affected (0.01 sec)

mysql> use test
Database changed
mysql> create table t1 (id int not null primary key, msg varchar(20));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t1 values (1, 'hello from node1');
Query OK, 1 row affected (0.01 sec)

mysql> show binlog events;
+------------------+------+----------------+-----------+-------------+----------------------------------------------------------------------------+
| Log_name         | Pos  | Event_type     | Server_id | End_log_pos | Info                                                                       |
+------------------+------+----------------+-----------+-------------+----------------------------------------------------------------------------+
| mysql-bin.000001 |    4 | Format_desc    |         1 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                                      |
| mysql-bin.000001 |  123 | Previous_gtids |         1 |         150 |                                                                            |
| mysql-bin.000001 |  150 | Gtid           |         1 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1'          |
| mysql-bin.000001 |  211 | Query          |         1 |         270 | BEGIN                                                                      |
| mysql-bin.000001 |  270 | View_change    |         1 |         369 | view_id=14845163185775300:1                                                |
| mysql-bin.000001 |  369 | Query          |         1 |         434 | COMMIT                                                                     |
| mysql-bin.000001 |  434 | Gtid           |         1 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'          |
| mysql-bin.000001 |  495 | Query          |         1 |         554 | BEGIN                                                                      |
| mysql-bin.000001 |  554 | View_change    |         1 |         693 | view_id=14845163185775300:2                                                |
| mysql-bin.000001 |  693 | Query          |         1 |         758 | COMMIT                                                                     |
| mysql-bin.000001 |  758 | Gtid           |         1 |         819 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:3'          |
| mysql-bin.000001 |  819 | Query          |         1 |         912 | create schema test                                                         |
| mysql-bin.000001 |  912 | Gtid           |         1 |         973 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:4'          |
| mysql-bin.000001 |  973 | Query          |         1 |        1110 | use `test`; create table t1 (id int not null primary key, msg varchar(20)) |
| mysql-bin.000001 | 1110 | Gtid           |         1 |        1171 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:5'          |
| mysql-bin.000001 | 1171 | Query          |         1 |        1244 | BEGIN                                                                      |
| mysql-bin.000001 | 1244 | Table_map      |         1 |        1288 | table_id: 219 (test.t1)                                                    |
| mysql-bin.000001 | 1288 | Write_rows     |         1 |        1341 | table_id: 219 flags: STMT_END_F                                            |
| mysql-bin.000001 | 1341 | Xid            |         1 |        1368 | COMMIT /* xid=144 */                                                       |
+------------------+------+----------------+-----------+-------------+----------------------------------------------------------------------------+
19 rows in set (0.00 sec)

The binary log events show that we are replicating using the ID of the group, instead of the ID of the single server.

In the other two nodes I run the operation a bit differently:

$ docker exec -ti node2 mysql -psecret
mysql> select * from performance_schema.global_variables where variable_name in ('read_only', 'super_read_only');
+-----------------+----------------+
| VARIABLE_NAME   | VARIABLE_VALUE |
+-----------------+----------------+
| read_only       | OFF            |
| super_read_only | OFF            |
+-----------------+----------------+
2 rows in set (0.01 sec)

mysql>  START GROUP_REPLICATION;
Query OK, 0 rows affected (5.62 sec)

mysql> select * from performance_schema.global_variables where variable_name in ('read_only', 'super_read_only');
+-----------------+----------------+
| VARIABLE_NAME   | VARIABLE_VALUE |
+-----------------+----------------+
| read_only       | ON             |
| super_read_only | ON             |
+-----------------+----------------+
2 rows in set (0.01 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | ecba1582-db68-11e6-a492-0242ac130002 | node1       |        3306 | ONLINE       |
| group_replication_applier | ecf2eae5-db68-11e6-a492-0242ac130003 | node2       |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.01 sec)

Now the cluster has two nodes, and I've seen that the nodes are automatically defined as read-only. I can repeat the same operation in the third one.

$ docker exec -ti node2 mysql -psecret
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (2.35 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | ecba1582-db68-11e6-a492-0242ac130002 | node1       |        3306 | ONLINE       |
| group_replication_applier | ecf2eae5-db68-11e6-a492-0242ac130003 | node2       |        3306 | ONLINE       |
| group_replication_applier | ed259dfc-db68-11e6-a4a6-0242ac130004 | node3       |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

What about the data? It's been replicated:

mysql> show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)

mysql> show tables from test;
+----------------+
| Tables_in_test |
+----------------+
| t1             |
+----------------+
1 row in set (0.01 sec)

Monitoring

In this flavor of replication there is no SHOW SLAVE STATUS. Everything I've got is in performanceschema tables and in mysql.slavemasterinfo and mysql.slaverelayloginfo, and sadly it is not a lot.

mysql> select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14845163185775300:3
                         MEMBER_ID: ecba1582-db68-11e6-a492-0242ac130002
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 3
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-6
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:5
1 row in set (0.00 sec)


mysql> select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: group_replication_recovery
               GROUP_NAME:
              SOURCE_UUID:
                THREAD_ID: NULL
            SERVICE_STATE: OFF
COUNT_RECEIVED_HEARTBEATS: 0
 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
 RECEIVED_TRANSACTION_SET:
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: group_replication_applier
               GROUP_NAME: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
              SOURCE_UUID: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
                THREAD_ID: NULL
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
 RECEIVED_TRANSACTION_SET: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-6
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)

Compared to regular replication, we lose the ID of the node where the data was originated. Instead, we get the ID of the group replication (which we set in the configuration file.) This is useful for a smoother operation of replacing the primary node (a.k.a. the master) with another node, but we have lost some valuable information that could have been added to the output rather than simply being replaced. Another valuable piece of information that is missing is the transactions that were executed (we only see RECEIVED_TRANSACTION_SET.) As in regular replication, we can get this information with "SHOW MASTER STATUS" or "SELECT @@global.gtid_executed", but as mentioned in improving the design of MySQL replication there are several flaws in this paradigm. What we see in MGR is a reduction of replication monitoring data, while we would have expected some improvement, given the complexity of the operations for this new technology.

Summing up

MySQL Group Replication is an interesting technology. If we consider it in the framework of a component for high availability (which will be completed when the InnoDB Cluster is released) it might improve the workflow of many database users.

As it is now, however, it gives the feeling of being a rushed up piece of software that does not offer any noticeable advantage to users, especially considering that the documentation released with it is far below the standards of other MySQL products.

↧

MySQL Group Replication, the perfect HA database backend for web hosting

January 16, 2017, 1:00 am

≫ Next: Improving the Stability of MySQL Single-Threaded Benchmarks

≪ Previous: MySQL group replication: installation with Docker

Many web hosting provider are looking for HA solution for the database backend they deliver to their customers.

Galera never became the perfect choice for these environment due to 2 factors:

no DBA really manage the databases
Galera runs database changes in Total Order Isolation

What does that really mean ? In fact, when you are a website hosting provider, you host the website (apache, nginx) on vhosts and you share a database server in which every customer has access to their own schema for their website.

Most of the time, those websites are CMS like Drupal, WordPress or Joomla (and certainly many others sharing the same expectations).

Using these tools allows you to create and manage websites quickly and easily. However on a shared environment, you can’t expect that all users will use the same version of the CMS at the same time, neither the same plugins. Some may have customized the core or plugins of their favorite solution.

This means that the application itself takes care of database design and operations. So if one of the users decides to upgrade his WordPress (or add/remove a plugin that will create/modify some table scheme), on a Galera Cluster, he will lock all writes on ALL databases served by the cluster. All writes will be stalled for the total execution time of the DDLs that are part of wp-admin/includes/upgrade.php.

This upgrade of that particular website will then affect all other sites that are on the same system.

MySQL Group Replication doesn’t suffer from the same behavior and makes it the ideal solution to achieve High Availability for you database on a shared system.

The only problem you could encounter with MySQL Group Replication is if you use a cluster in Multi-Primary mode and preform concurrent DDLs as I explained on this videocast.

To avoid any problem, when using MySQL Group Replication in Multi-Primary node, it’s recommended to route all DDLs to the same node (this is not needed if you use the default Single-Primary mode). You could filter out such statements using ProxySQL between your web servers and your database server.

↧

Improving the Stability of MySQL Single-Threaded Benchmarks

January 16, 2017, 2:24 am

≫ Next: Released MyMSSQLDump 1.1

≪ Previous: MySQL Group Replication, the perfect HA database backend for web hosting

I have for some years been running the queries of the DBT-3 benchmark, both to verify the effect of new query optimization features, and to detect any performance regressions that may have been introduced. However, I have had issues with getting stable results. While repeated runs of a query is very stable, I can get quite different results if I restart the server. As an example, I got a coefficient of variation (COV) of 0.1% for 10 repeated executions of a query on the same server, while the COV for the average runtime of 10 such experiments was over 6%!

With such large variation in results, significant performance regressions may not be noticed. I have tried a lot of stuff to get more stable runs, and in this blog post I will write about the things that I have found to have positive effects on stability. At the end, I will also list the things I have tried that did not show any positive effects.

Test Enviroment

First, I will describe the setup for the tests I run. All tests are run on a 2-socket box running Oracle Linux 7. The CPUs are Intel Xeon E5-2690 (Sandy Bridge) with 8 physical cores @ 2.90GHz.

I always bind the MySQL server to a single CPU, using taskset or numactl, and Turbo Boost is disabled. The computer has 128 GB of RAM, and the InnoDB buffer pool is big enough to contain the entire database. (4 GB buffer pool for scale factor 1 and 32 GB buffer pool for scale factor 10.)

Each test run is as follows:

Start the server
Run a query 20 times in sequence
Repeat step 2 for all DBT-3 queries
Stop the server

The result for each query will be the average execution times of the last 10 runs. The reason for the long warm-up period is that, from experience, when InnoDB's Adaptive Hash Index is on, you will need 8 runs or so before execution times are stable.

As I wrote above, the variance within each test run is very small, but the difference between test runs can be large. The variance is somewhat improved by picking the best result out of three test runs, but it is still not satisfactory. Also, a full test run on a scale factor 10 database takes 9 hours, so I would like to avoid having to repeat the tests multiple times.

Address Space Layout Randomization

A MySQL colleague mentioned that he had heard about some randomization that was possible to disable. After some googling, I learned about Address Space Layout Randomization (ASLR). This is a security technique that is used to prevent an attacker from being able to determine where code and data are located in the address space of a process. I also found some instructions on stackoverflow for how to disable it on Linux.

Turning off ASLR sure made a difference! Take a look at this graph that shows the average execution time for Query 12 in ten different test runs with and without ASLR (All runs are with a scale factor 1 DBT-3 database on MySQL 8.0.0 DMR):

I will definitely make sure ASLR is disabled in future tests!

Adaptive Hash Index

InnoDB maintains an Adaptive Hash Index (AHI) for frequently accessed pages. The hash index will speed up look-ups on primary key, and is also useful for secondary index accesses since a primary key look-up is needed to get from the index entry to the corresponding row. Some DBT-3 queries run twice as slow if I turn off AHI, so it has definitely some value. However, experience shows that I will have to repeat a query several times before the AHI is actually built for all pages accessed by the query. I plan to write another blog post where I discuss more about AHI.

Until I stumbled across ASLR, turning off AHI was my best bet at stabilizing the results. After disabling ASLR, also turning off AHI only shows a slight improvement in stability. However, there are other reasons for turning off AHI.

I have observed some times that with AHI on, a change of query plan for one query may affect the execution time of subsequent queries. I suspect the reason is that the content of the AHI after a query has been run, may change with a different query plan. Hence, the next query may be affected if it accesses the same data pages.

Turning off AHI also means that I no longer need the long warm-up period for the timing to stabilize. I can then repeat each query 10 times instead of 20 times. This means that even if many of the queries take longer to execute, the total time to execute a test run will be lower.

Because of the above, I have decided to turn off AHI in most of my test runs. However, I will run with AHI on once in a while just to make sure that there are no major regressions in AHI performance.

Preload Data and Indexes

I also tried to start each test run with a set of queries that would sequentially scan all tables and indexes. My thinking was that this could give a more deterministic placement of data in memory. Before I turned off ASLR, preloading had very good effects on the stability when AHI was disabled. With ASLR off, the difference is less significant, but there is still a slight improvement.

Below is a table that shows some results for all combinations of the settings discussed so far. Ten test runs were performed for each combination on a scale factor 1 database. The numbers shown is the average difference between the best and the worst runs over all queries, and the largest difference between the best and the worst runs for a single query.

ASLR	AHI	Preload	Avg(MAX-MIN)	Max(MAX-MIN)
✔	✔	✘	6.18%	14.75%
✔	✘	✘	4.65%	14.79%
✔	✔	✔	5.56%	14.65%
✔	✘	✔	2.18%	5.05%
✘	✔	✘	1.66%	3.94%
✘	✘	✘	1.58%	3.58%
✘	✔	✔	1.66%	3.78%
✘	✘	✔	1.09%	3.27%

From the above table it is clear that the most stable runs are achieved by using preloading in combination with disabling both ASLR and AHI.

For one of the DBT-3 queries, using preloading on a scale factor 10 database leads to higher variance within a test run. While the COV within a test run is below 0.2% for all queries without preloading, query 21 has a COV of 3% with preloading. I am currently investigating this, and I have indications that the variance can be reduced by setting the memory placement policy to interleave. I guess the reason is that with a 32 GB InnoDB buffer pool, one will not be able to allocate all memory locally to the CPU where the server is running.

What Did Not Have an Effect?

Here is a list of things I have tried that did not seem to have a positive effect on the stability of my results:

Different governors for CPU frequency scaling. I have chosen the performance governor, because it "sounds good", but I did not see any difference when using the powersave governor instead. I also tried turning of the Intel pstate driver, but did not notice any difference in that case either.
Bind the MySQL server to a single core or thread (instead of CPU).
Bind the MySQL client to a single CPU.
Different settings for NUMA memory placement policy using numactl. (With the possible exception of using interleave for scale factor 10 as mentioned above.)
Different memory allocation libraries (jemalloc, tcmalloc). jemalloc actually seemed to increase the instability.
Disable InnoDB buffer pool preloading: innodb_buffer_pool_load_at_startup = off
Set innodb_old_blocks_time = 0

Conclusion

My tests have shown that I get better stability of test results if I disable both ASLR and AHI, and that combining this with preloading of all tables and indexes in many cases will further improve the stability of my test setup.

I welcome any comments and suggestions on how to further increase the stability for MySQL benchmarks. I do not claim to be an expert in this field, and any input will be highly appreciated.

↧

Released MyMSSQLDump 1.1

January 16, 2017, 12:46 pm

≫ Next: Ad-hoc Data Visualization and Machine Learning with mysqlshell

≪ Previous: Improving the Stability of MySQL Single-Threaded Benchmarks

My program for exporting data from MSSQL and Sybase into a whole bunch of other formats, including:

JSON
HTML
CSV
MySQL (mysqldump style)
MYSQL / Sybase INSERT-style
Oracle INSERT-style

is now released in version 1.1. There is a whole bunch of new things, most notable Oracle style export format, but also:

DATETIME datatype formatting
DATETIMEOFFSET formatting
Other temporal datatype support
Much more flexible formatting in general
More tests

As usual this is GPL v2 licenced and is available to download from sourceforge.

Happy SQL'ing
/Karlsson

↧

Ad-hoc Data Visualization and Machine Learning with mysqlshell

January 16, 2017, 2:00 pm

≫ Next: 20 Common Performance_schema FAQs

≪ Previous: Released MyMSSQLDump 1.1

In this blog post, I am going to show how we can use mysqlshell to run ad-hoc data visualizations and use machine learning to predict new outcomes from the data.

Some time ago Oracle released MySQL Shell, a command line client to connect to MySQL using the X protocol. It allows us to use Python or JavaScript scripting capabilities. This unties us from the limitations of SQL, and the possibilities are infinite. It means that MySQL can not only read data from the tables, but also learn from it and predict new values from features never seen before.

Some disclaimers:

This is not a post about to how to install mysqlshell or enable the X plugin. It should be already installed. Follow the first link if instructions are needed.
The idea is to show some of the things that can be done from the shell. Don’t expect the best visualizations or a perfectly tuned Supervised Learning algorithm.

It is possible to start mysqlshell with JavaScript or Python interpreter. Since we are going to use Pandas, NumPy and Scikit, Python will be our choice. There is an incompatibility between mysqlshell and Python > 2.7.10 that gives an error when loading some external libraries, so make sure you use 2.7.10.

We’ll work the “employees” database that can be downloaded here. In order to make everything easier and avoid several lines of data parsing, I have created a new table that summarizes the data we are going to work with, generated using the following structure and query:

mysql> show create table data\G
*************************** 1. row ***************************
Create Table: CREATE TABLE `data` (
  `emp_no` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `hired` int(11) DEFAULT NULL,
  `gender` int(11) DEFAULT NULL,
  `salary` int(11) DEFAULT NULL,
  `department` int(11) DEFAULT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> INSERT INTO data SELECT employees.emp_no, YEAR(now()) - YEAR(birth_date) as age, YEAR(now()) - YEAR(hire_date) as hired, IF(gender='M',0,1) as gender, max(salary) as salary, RIGHT(dept_no,1) as department from employees, salaries, dept_emp
WHERE employees.emp_no = salaries.emp_no and employees.emp_no = dept_emp.emp_no and dept_emp.to_date="9999-01-01"
GROUP BY emp_no, dept_emp.dept_no;

mysql> select * from data limit 5;
+--------+------+-------+--------+--------+------------+
| emp_no | age  | hired | gender | salary | department |
+--------+------+-------+--------+--------+------------+
|  10001 |   64 |    31 |      0 |  88958 |          5 |
|  10002 |   53 |    32 |      1 |  72527 |          7 |
|  10003 |   58 |    31 |      0 |  43699 |          4 |
|  10004 |   63 |    31 |      0 |  74057 |          4 |
|  10005 |   62 |    28 |      0 |  94692 |          3 |
+--------+------+-------+--------+--------+------------+

The data is:

Age: the age of the employee
Hired: the number of years working in the company
Gender: 0 Male, 1 Female
Salary: the salary

It only includes people currently working at the company.

Now that the data is ready, let’s start with mysqlshell. Everything that follows was done directly from the shell itself.

Starting the Shell and Loading the Libraries

mysqlsh -uroot -p -h127.0.0.1 --py

Once the login is validated, we will see the following prompt:

mysql-py>

That means we are using the shell in Python mode. We can start loading our libraries:

mysql-py> import pandas as pd
mysql-py> import numpy as np
mysql-py> import seaborn
mysql-py> import matplotlib.pyplot as plt
mysql-py> from sklearn import tree

Now, we read each column from the table and store it in its own variable:

mysql-py> use employees
mysql-py> def column_to_list(column_name):
    temp_var = db.data.select([column_name]).execute().fetch_all()
    return [val for sublist in temp_var for val in sublist]
mysql-py> gender = column_to_list("gender")
mysql-py> salary = column_to_list("salary")
mysql-py> age = column_to_list("age")
mysql-py> hired = column_to_list("hired")
mysql-py> department = column_to_list("department")

And create a Pandas dataframe used to generate the visualizations:

df = pd.DataFrame({'Gender': gender,
                   'Salary': salary,
                   'Age': age,
                   'Hired': hired,
                   'Department': department
                   })

Data Analysis

Now, let’s investigate the data. Some basic statistics to get age, hired and salary overview:

mysql-py> print df[["Salary","Age","Hired",]].describe(percentiles=(.75,.90,.99))
              Salary            Age          Hired
count  240124.000000  240124.000000  240124.000000
mean    72041.332178      58.918226      27.413782
std     17305.819632       3.750406       3.525041
min     40000.000000      52.000000      17.000000
50%     69827.000000      59.000000      28.000000
75%     82570.000000      62.000000      30.000000
90%     96125.000000      64.000000      32.000000
99%    119229.390000      65.000000      32.000000
max    158220.000000      65.000000      32.000000

Those statistics already give us good information. The employees range from 52 to 65, having an average of 59. They have been working at the company for 27 years on average with a salary of 72041.

But let’s forget about numbers. The human brain works much better and faster interpreting graphs than reading a table full of numbers. Let’s create some graphs and see if we can find any relationship.

Data Visualization

Relation between Gender and Salary:

mysql-py> df.groupby(['Gender']).mean()['Salary'].plot(kind='bar')
mysql-py> plt.show()

gender

Relation between Age and Salary:

mysql-py> df.groupby(['Age']).mean()['Salary'].plot(kind='bar')
mysql-py> plt.show()

age

Relation between Department and Salary:

mysql-py> df.groupby(['Department']).mean()['Salary'].plot(kind='bar')
mysql-py> plt.show()

department

Relation between Hired and Salary:

mysql-py> df.groupby(['Hired']).mean()['Salary'].plot(kind='bar')
mysql-py> plt.show()

hired

Now everything is more clear. There is no real relationship between gender and salary (yay!) or between age and salary. Seems that the average salary is related to the years that an employee has been working at the company, It also shows some differences depending on the department he/she belongs to.

Making Predictions: Machine Learning

Up to this point we have been using matplotlib, Pandas and NumPy to investigate and create graphs from the data stored in MySQL. Everything is from the shell itself. Amazing, eh? Now let’s take a step forward. We are going to use machine learning so our MySQL client is not only able to read the data already stored, but also predict a salary.

Decision Tree Regression from SciKit Learn is the supervised learning algorithm we’ll use. Remember, everything is still from the shell!

Let’s separate the data into features and labels. From Wikipedia:

“Feature is an individual measurable property of a phenomenon being observed.”

Taking into account the graphs we saw before, “hired” and “department” are good features that could be used to predict the correct label (salary). In other words, we will train our Decision Tree by giving it “hired” and “department” data, along with their labels “salary”. The idea is that after the learning phase, we can ask it to predict a salary based on new “hired” and “department” data we provide. Let’s do it:

Separate the data in features and labels:

mysql-py> features = np.column_stack((hired, department))
mysql-py> labels = np.array(salary)

Train our decision tree:

mysql-py> clf = tree.DecisionTreeRegressor()
mysql-py> clf = clf.fit(features, labels)

Now, MySQL, tell me:

What do you think the salary of a person that has been working 25 years at the company, currently in department number 2, should be?

mysql-py> clf.predict([[25, 2]])
array([ 75204.21140143])

It predicts that the employee should have a salary of 75204. A person working there for 25 years, but in department number 7, should have a greater salary (based on the averages we saw before). What does our Decision Tree say?

mysql-py> clf.predict([[25, 7]])
array([ 85293.80606296])

Summary

Now MySQL can both read data we already know, and it can also predict it! mysqlshell is a very powerful tool that can be used to help us in our data analysis tasks. We can calculate statistics, visualize graphs, use machine learning, etc. There are many things you might want to do with your data without leaving the MySQL Shell.

↧

20 Common Performance_schema FAQs

January 16, 2017, 2:15 pm

≫ Next: Percona Live Featured Tutorial with Morgan Tocker — MySQL 8.0 Optimizer Guide

≪ Previous: Ad-hoc Data Visualization and Machine Learning with mysqlshell

1.What are the different types of tables ?

Current/Summary/History/

Setup tables:
Setup_consumers
Setup_actors
Setup_instruments
………………….

Waits

Statements

Stages

Misc

2. By default what instruments are turned on ?

Only global,thread,transaction and statements instrumentation is enabled by default

Others like stage,synch events etc,, are disabled by default

You can check the complete list of enabled instruments in setup_instruments

3. What consumers are enabled by default ?

mysql> select * from performance_schema.setup_consumers ;
+——————————–+———+
| NAME | ENABLED |
+——————————–+———+
| events_stages_current | NO |
| events_stages_history | NO |
| events_stages_history_long | NO |
| events_statements_current | YES |
| events_statements_history | NO |
| events_statements_history_long | NO |
| events_waits_current | NO |
| events_waits_history | NO |
| events_waits_history_long | NO |
| global_instrumentation | YES |
| thread_instrumentation | YES |
| statements_digest | YES |
+——————————–+———+
12 rows in set (0.00 sec)

4. How big the history table can get ?

Performance_schema_events_statements_history_size is the number of rows by thread. Default is 10

5. What is the difference between history and history long ?

Performance_schema_events_statements_history_long_size Default is 10000. Maximum number of rows

6. What are some examples of wait events ?

wait/io/file/sql/binlog
wait/io/file/innodb/innodb_data_file
wait/io/socket — Network io

7. What are the time units for statement execution ?

Picoseconds
One trillion of a second

8. How much memory is being used by performance_schema ?

SELECT * FROM memory_summary_global_by_event_name

WHERE EVENT_NAME LIKE ‘memory/performance_schema/%’;

9. What is the overhead ?

10% — As recorded by Percona and Mark
20% — If all instruments are enabled

10.How to check the previous statments run by a particular thread ?

Select * from events_statement_history where thread_id=60 \G

11. What is part of stored routine call ?

statement/sp/statement1
statement/sp/statement2

12. What is the replacement for show profile ?

Stage tables

13. What is the kind of replacement for slow log ?

Events_statements_summary_by_digest

14. What is the default size of events_statements_summary_by_digest ? How to know whether it is sufficient for your workload or not ?

The events_statements_summary_by_digest table has a limited maximum number of rows (200 by default, but MySQL 5.6.5 can be modified with the performance_schema_digests_size variable). As a consequence, when the table is full, statement digest values that have no already existing row will be added to a special “catch-all” row with DIGEST = NULL. In plain English: you won’t have meaningful info for those statements.

15. What are important columns in events_statements_summary_by_digest ?

Column

Description

SCHEMA NAME

Database name. Records are summarised together with DIGEST.

DIGEST

Performance Schema digest. Records are summarised together with SCHEMA NAME.

DIGEST TEXT

The unhashed form of the digest.

COUNT_STAR

Number of summarized events

SUM_TIMER_WAIT

Total wait time of the summarized events that are timed.

MIN_TIMER_WAIT

Minimum wait time of the summarized events that are timed.

AVG_TIMER_WAIT

Average wait time of the summarized events that are timed.

MAX_TIMER_WAIT

Maximum wait time of the summarized events that are timed.

SUM_LOCK_TIME

Sum of the LOCK_TIME column in the events_statements_current table.

SUM_ERRORS

Sum of the ERRORS column in the events_statements_current table.

SUM_WARNINGS

Sum of the WARNINGS column in the events_statements_current table.

SUM_ROWS_AFFECTED

Sum of the ROWS_AFFECTED column in the events_statements_current table.

SUM_ROWS_SENT

Sum of the ROWS_SENT column in the events_statements_current table.

SUM_ROWS_EXAMINED

Sum of the ROWS_EXAMINED column in the events_statements_current table.

SUM_CREATED_TMP_DISK_TABLES

Sum of the CREATED_TMP_DISK_TABLES column in the events_statements_current table.

SUM_CREATED_TMP_TABLES

Sum of the CREATED_TMP_TABLES column in the events_statements_current table.

SUM_SELECT_FULL_JOIN

Sum of the SELECT_FULL_JOIN column in the events_statements_current table.

SUM_SELECT_FULL_RANGE_JOIN

Sum of the SELECT_FULL_RANGE_JOIN column in the events_statements_current table.

SUM_SELECT_RANGE

Sum of the SELECT_RANGE column in the events_statements_current table.

SUM_SELECT_RANGE_CHECK

Sum of the SELECT_RANGE_CHECK column in the events_statements_current table.

SUM_SELECT_SCAN

Sum of the SELECT_SCAN column in the events_statements_current table.

SUM_SORT_MERGE_PASSES

Sum of the SORT_MERGE_PASSES column in the events_statements_current table.

SUM_SORT_RANGE

Sum of the SORT_RANGE column in the events_statements_current table.

SUM_SORT_ROWS

Sum of the SORT_ROWS column in the events_statements_current table.

SUM_SORT_SCAN

Sum of the SORT_SCAN column in the events_statements_current table.

SUM_NO_INDEX_USED

Sum of the NO_INDEX_USED column in the events_statements_current table.

SUM_NO_GOOD_INDEX_USED

Sum of the NO_GOOD_INDEX_USED column in the events_statements_current table.

FIRST_SEEN

Time at which the digest was first seen.

LAST_SEEN

Time at which the digest was most recently seen.

SUM_ROWS_AFFECTED :

This is for update,insert or delete statements

ROWS_SENT Vs ROWS_EXAMINED

If rows_examined is by far larger than rows_sent, say 100 larger, then the query is a great candidate for optimization.

SELECT_RANGE

It refers to number of SELECT queries that were performed to satisfy a limitedrange of conditions. The queries

SELECT * FROM tbl1 WHERE col1 BETWEEN 5 AND 13; and

SELECT * FROM tbl1 WHERE col1 > 5 AND col1 < 13; SUM_CREATED_TMP_TABLES Vs SUM_CREATED_TMP_DISK_TABLES How many are created on disk . If the ratio is more then there is a definite problem SORT_RANGE SELECT * FROM users WHERE points > (SELECT points FROM users WHERE id = {user_id}) ORDER By points ASC LIMIT 3;

SORT_MERGE_PASSES

Sort_merge_passes consists of two steps. MySQL will first try to do sorted in memory, The use of memory size is determined by the Sort_buffer_size system variable, If its size is not all the records are read into memory, MySQL will give the in memory sort the results stored in the temporary file, After MySQL finds all records, The temporary file records of a sort. This sort again will increase Sort_merge_passes. In fact, MySQL will use another temporary file to save again to sort the results, so usually see numerical Sort_merge_passes increased two times to build temporary file number. Because of the use of the temporary files, so the speed can be slow, increase Sort_buffer_size will reduce the number of Sort_merge_passes and create a temporaryfile.

16. Some of the metrics you can collect with statements_digest table ?

Queries not using indexes
Queries examining more rows compared to rows sent
Queries doing more sort_merge_passes / temporary tables to disk
Queries throwing more errors
Queries throwing more warnings
Queries sorted by avg exec time
Queries sorted by max exec time
Queries spending more time waiting for locks

17. What is the use of events_waits_current table in real time ?

Client A (ID 2)> select sleep(10);
While the query is still executing, see what event the code is waiting for.

Monitor> select * from EVENTS_WAITS_CURRENT where THREAD_ID=3\G
*************************** 1. row ***************************
THREAD_ID: 3
EVENT_ID: 46
EVENT_NAME: wait/synch/cond/sql/Item_func_sleep::cond
SOURCE: item_func.cc:3527
TIMER_START: 5617352941894736
TIMER_END: NULL
TIMER_WAIT: NULL

SPINS: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_TYPE: NULL
OBJECT_INSTANCE_BEGIN: 139803207157216
NESTING_EVENT_ID: NULL
OPERATION: timed_wait
NUMBER_OF_BYTES: NULL
FLAGS: 0
1 row in set (0.00 sec)
Looking at the results, we can tell that the query is still waiting (TIMER_WAIT is NULL) for an instrumented condition

18. What are the recent wait events of this connection ?

After the sleep(10) is completed, let’s look at the wait history for Client A connection.

Monitor> select THREAD_ID, EVENT_ID, EVENT_NAME, SOURCE, TIMER_WAIT, OBJECT_INSTANCE_BEGIN, OPERATION from EVENTS_WAITS_HISTORY where THREAD_ID=3 order by THREAD_ID, EVENT_ID;
+———–+———-+———————————————+——————–+—————+———————–+————+
| THREAD_ID | EVENT_ID | EVENT_NAME | SOURCE | TIMER_WAIT | OBJECT_INSTANCE_BEGIN | OPERATION |
+———–+———-+———————————————+——————–+—————+———————–+————+
| 3 | 43 | wait/synch/mutex/mysys/THR_LOCK_malloc | safemalloc.c:181 | 399243 | 18775392 | lock |
| 3 | 44 | wait/synch/mutex/mysys/THR_LOCK_malloc | safemalloc.c:294 | 210035 | 18775392 | lock |
| 3 | 45 | wait/synch/mutex/sql/LOCK_user_locks | item_func.cc:3837 | 500907 | 18580768 | lock |
| 3 | 46 | wait/synch/cond/sql/Item_func_sleep::cond | item_func.cc:3527 | 5000241974790 | 140221896639968 | timed_wait |
| 3 | 47 | wait/synch/cond/sql/Item_func_sleep::cond | item_func.cc:3527 | 5000237350843 | 140221896639968 | timed_wait |
| 3 | 48 | wait/synch/mutex/mysys/my_thread_var::mutex | item_func.cc:3853 | 612102 | 19805600 | lock |
| 3 | 49 | wait/synch/mutex/sql/LOCK_plugin | sql_plugin.cc:1013 | 356883 | 18701152 | lock |
| 3 | 50 | wait/synch/rwlock/sql/LOGGER::LOCK_logger | log.h:582 | 276046 | 18672512 | read_lock |
| 3 | 51 | wait/synch/mutex/sql/LOG::LOCK_log | log.cc:2515 | 372062 | 18989864 | lock |
| 3 | 52 | wait/synch/mutex/sql/THD::LOCK_thd_data | sql_class.cc:3267 | 276046 | 19798240 | lock |
+———–+———-+———————————————+——————–+—————+———————–+————+
10 rows in set (0.00 sec)

19.How to maintain all history ?

https://www.percona.com/blog/2015/10/13/mysql-query-digest-with-performance-schema/

20. What are the improvements in 5.7 ?

Memory tables in detail
Meta data lock troubleshooting

↧

Percona Live Featured Tutorial with Morgan Tocker — MySQL 8.0 Optimizer Guide

January 16, 2017, 3:06 pm

≫ Next: MySQL Day – Sessions review #4

≪ Previous: 20 Common Performance_schema FAQs

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Morgan Tocker, MySQL Product Manager at Oracle. His tutorial is a MySQL 8.0 Optimizer Guide. Many users who follow MySQL development are aware that recent versions introduced a number of improvements to query execution (via the addition of new execution strategies and an improved cost model). But what we don’t talk enough about is that the diagnostic features are also significantly better. I had a chance to speak with Morgan and learn a bit more about the MySQLOptimizer:

Percona: How did you get into database technology? What do you love about it?

Morgan: I started my career as a web developer, mainly focusing on the front end area. As the team I worked on grew and required different skills, I tried my hand at the back end programming. This led me to databases.

I think what I enjoyed about databases at the time was that front end design was a little bit too subjective for my tastes. With databases, you could prove what was “correct” by writing a simple micro-benchmark. I joined the MySQL team in January 2006, and rejoined it again in 2013 after a five-year hiatus.

I don’t quite subscribe to this same view on micro benchmarks today, since it is very easy to accidentally (or intentionally) write a naïve benchmark. But I am still enjoying myself.

Percona: Your tutorial is called “MySQL 8.0 Optimizer Guide.” What exactly is the MySQL optimizer, and what new things have been added in MySQL 8.0?

Morgan: Because SQL is declarative (i.e., you state “what you want” rather than “do this then that”), there is a process that has to happen internally to prepare a query for execution. I like to describe it as similar to what happens when you enter an address in a GPS navigator. Some software then spits out the best steps on how to get there. In a database server, the Optimizer is that software code area.

There are a number of new optimizer features in MySQL 8.0, both in terms of new syntax supported and performance improvements to existing queries. These will be covered in some talks at the main conference (and also my colleague Øystein’s tutorial). The goal of my tutorial is to focus more on diagnostics than the enhancements themselves.

Percona: How can you use diagnostics to improve queries?

Morgan: To put it in numbers: it is not uncommon to see a user obsess over a configuration change that may yield a 2x improvement, and not spot the 100x improvement available by adding an index!

I like to say that users do not get the performance that they are entitled to if and when they lack the visibility and diagnostics available to them:

-In MySQL 5.6, an optimizer trace diagnostic was added so that you can now see not only why the optimizer arrived at a particular execution plan, but why other options were avoided.

-In MySQL 5.7, the EXPLAIN FORMAT=JSON command now includes the cost information (the internal formula used for calculations). My experience has been that sharing this detail actually makes the optimizer a lot easier to teach.

Good diagnostics by themselves do not make the improvements, but they bring required changes to the surface. On most systems, there are opportunities for improvements (indexes, hints, slight changes to queries, etc.).

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Morgan: Visibility into running systems has been a huge priority for the MySQL Engineering team over the last few releases. I think in many cases, force-of-habit leaves users using an older generation of diagnostics (EXPLAIN versus EXPLAIN FORMAT=JSON). My goal is to show them the light using the state-of-the-art stack. This is why I called my talk an 8.0 guide, even though much of it is still relevant for 5.7 and 5.6.

I also have a companion website to my tutorial at www.unofficalmysqlguide.com.

Percona: What are you most looking forward to at Percona Live?

Morgan: I’m excited to talk to users about MySQL 8.0, and not just in an optimizer sense. The MySQL engineering team has made a large investment in improving the reliability of MySQL with the introduction of a native data dictionary. I expect it will be the subject of many discussions, and a great opportunity for feedback.

There is also the social aspect for me, too. It will be 11 years since I first attended the predecessor to Percona Live. I have a lot of fond memories, and enjoy catching up with new friends and old over a beer!

You can find out more about Morgan Tocker and his work with the Optimizer at his tutorial website. Want to find out more about Morgan and MySQL query optimization? Register for Percona Live Data Performance Conference 2017, and see his MySQL 8.0 Optimizer Guide tutorial. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

↧

MySQL Day – Sessions review #4

January 16, 2017, 4:19 pm

≫ Next: How to replace the TABLE identifier generator with either SEQUENCE or IDENTITY in a portable way

≪ Previous: Percona Live Featured Tutorial with Morgan Tocker — MySQL 8.0 Optimizer Guide

On February 3rd, just before Fosdem and the MySQL & Friends Devroom, MySQL’s Community Team is organizing the pre-Fosdem MySQL Day.

Today’s highlighted sessions are the one of Jean-Françcois Gagné, from Booking.com:

How Booking.com avoids and deals with replication lag at 12.05
Monitoring Booking.com without looking at MySQL at 15.30

Jean-François is working on growing the MySQL/MariaDB installations in Booking.com since he joined in 2013. His main task is focused on replication bottlenecks (and some other engineering problems too of course). Jean-François works on improving Parallel Replication and deploys Binlog Servers. He also has a good understanding of replication in general and a respectable understanding of InnoDB, Linux and TCP/IP.

In the fist talk, Jean-François doesn’t discuss about making replication faster, he will explain how to deal with the asynchronous nature of MySQL replication and therefore cover the (in-)famous lag.
During the session, he will start by quickly explaining the consequences of asynchronous replication and how/when lag
can happen. Then, Jean-François will present the solution used at Booking.com to avoid both creating lag and
minimize the consequence of stale reads on slaves (hint: this solution does not mean reading from the
master because this does not scale).

The second session is more different. I saw the first live presentation of it in Percona Live in Amsterdam and I really enjoy it. I don’t want to spoil it, so you will have to come to see Jean-François on stage sharing some “secrets” on how booking.com is monitored without looking at MySQL

Don’t forget to register for this main MySQL event and for the MySQL Community Dinner that will happen on Saturday, February 4th just after the FOSDEM’s MySQL & Friends Devroom,

↧

How to replace the TABLE identifier generator with either SEQUENCE or IDENTITY in a portable way

January 17, 2017, 1:52 am

≫ Next: MySQL Day – Sessions review #5

≪ Previous: MySQL Day – Sessions review #4

Introduction As previously explained, the TABLE identifier generator does not scale, so you should avoid id. However, some enterprise applications might need to run on both MySQL (which does not support database sequences), as well as Oracle, PostgreSQL and SQL Server 2012. This is article is going to explain how easily you can achieve this … Continue reading →

↧

Scenario A: Filtering on mysql schema

Scenario B: Filtering on tmp or similar schema

Conclusion

CVE-2016-6225

Compatibility

Applicability

Credits

More Information

Release Notes

Throttling in gh-ost

Runtime configuration changes in gh-ost

An Embarrassment of Riches

Example

So Now We Have Data

Full Listing

Bad Replication

First Look at LOGICAL_CLOCK

Serious Delays

The Return of LOGICAL_CLOCK

Setup

Simple Setup using a single VIP 3 ProxySQL 3 Galera nodes

Let us see now another case.

Conclusions

Overview

Installing MySQL Group Replication

The environment:

Operations

Monitoring

Summing up

Test Enviroment

Address Space Layout Randomization

Adaptive Hash Index

Preload Data and Indexes

What Did Not Have an Effect?

Conclusion

Starting the Shell and Loading the Libraries

Data Analysis

Data Visualization

Making Predictions: Machine Learning

Summary

Scenario A: Filtering on `mysql` schema

Scenario B: Filtering on `tmp` or similar schema

First Look at `LOGICAL_CLOCK`

The Return of `LOGICAL_CLOCK`