SSL/TLS Connections to Recent MySQL Servers in Java

August 11, 2015, 8:05 am

≫ Next: Virtual Columns and Effective Functional Indexes in InnoDB

≪ Previous: Installing Lighttpd with PHP5 (PHP-FPM) and MySQL on Debian 8 (Jessie)

Recent changes to support better security by increasing strength of Diffie-Hellman cipher suites from 512-bit to 2048-bit were introduced to MySQL Server 5.7. While this change enhances security, it is an aggressive change in that 2048-bit DH ciphers are not universally supported. This has become a problem specifically for Java users, as only Java 8 JRE (currently) supports DH ciphers greater than 1024 bits. Making the problem more acute, this change was back-ported from MySQL Server 5.7 to the recent 5.6.26 and 5.5.45 releases in response to a community bug report. This blog post will identify affected applications, existing workarounds, and our plans to provide a more permanent solution in upcoming maintenance releases.

Problem Symptoms

Because the MySQL Server is trying to negotiate a DH cipher size (2048) that’s larger than what many JREs can support (1024), affected programs will see an error trying to establish a connection to MySQL Server. The important parts of the stack trace look like this:

com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 480 milliseconds ago.  The last packet sent successfully to the server was 474 milliseconds ago.
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...
Caused by: javax.net.ssl.SSLException: java.lang.RuntimeException: Could not generate DH keypair
...
Caused by: java.lang.RuntimeException: Could not generate DH keypair
...
Caused by: java.security.InvalidAlgorithmParameterException: Prime size must be multiple of 64, and can only range from 512 to 1024 (inclusive)

The information regarding the DH keypair is found in a nested Exception – if you only inspect the top-level Exception, it will have only generic information:

Error Code:0
SQL State:08S01
Error Message:Communications link failure

If your application (or framework) hides the full stack trace when writing to the application log, the cause will likely be difficult to determine. Connector/Java developers noticed this, and added useful additional text to the error message, starting in Connector/Java 5.1.35:

Error Message:The driver was unable to create a connection due to a possible incompatibility between the default enabled cipher suites in this JVM and the security layer provided by the server.

MySQL Community version 5.7.6 and above require a 2048 bit key during DH key exchange. This was not supported in Java before v8. Setting the connection property ‘enabledSSLCipherSuites=TLS_RSA_WITH_AES_128_CBC_SHA,SSL_RSA_WITH_RC4_128_SHA,SSL_RSA_WITH_3DES_EDE_CBC_SHA,SSL_RSA_WITH_RC4_128_MD5,SSL_RSA_WITH_DES_CBC_SHA’ uses RSA key exchange instead.

For users of recent MySQL Connector/Java releases, this additional information will prove useful in diagnosing this problem.

Problematic Deployments

Listed below are characteristics of deployment environments which are affected (all must be met):

MySQL Server version is one of the below:
- 5.5.45
- 5.6.26
- 5.7.6 – 5.7.8
JRE is one of the below:
- Java 8, before b56
- Java 7 or earlier
SSL/TLS connections in use
Not using an external JSSE provider (e.g., Bouncy Castle)

Additionally, DH ciphersuites must be enabled. They are by default, and while DBAs could theoretically restrict their use by specifying values for the –ssl-cipher server option that don’t include them, DH ciphersuites uniquely provide perfect forward secrecy among MySQL-supported options.

Workarounds

As mentioned above, use of DH ciphersuites can be restricted. This can be done on the server side using –ssl-cipher, and can be done on the client side with the enabledSSLCipherSuites property added in Connector/Java 5.1.35. as described in the error message text cited earlier. The configuration cited in the error message text avoids the Exception during connection, but it also negotiates a ciphersuite other than DH (thus abandoning PFS).

Applications which use (or can use) Java 8 should ensure that they are running at least b56 or later before upgrading MySQL Server to an affected version where SSL is used.

Future plans

It is very likely that MySQL Server 5.5 and 5.6 will have the DH length reduced from 2048 to 1024 to eliminate incompatibility with legacy Java applications. Additionally, we are evaluating making this a runtime configuration option for all versions of MySQL Server, instead of the compile-time option that currently exists.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Virtual Columns and Effective Functional Indexes in InnoDB

August 11, 2015, 10:03 am

≫ Next: Testing and Verifying your MySQL Backup Strategy Presentation

≪ Previous: SSL/TLS Connections to Recent MySQL Servers in Java

In April, I wrote a blog post introducing the new InnoDB virtual column and effective functional index work for the JSON lab release. Now the feature is officially in MySQL 5.7.8. It’s worth revisiting the topic again in order to write about what is in the 5.7 Release Candidate, along with providing some additional examples. I also ran some quick performance tests comparing some simple operations on a table with and without virtual columns and indexes, comparing them with those using materialized generated columns.

Key Features

The earlier article already described the design in detail. If you have not already read it, I would recommend that you check it out first. I won’t go into all of the design detail again here, but I will restate some key points:

The virtual columns are no longer materialized in the table. They are truly “virtual”, which means they are NOT stored in InnoDB rows (InnoDB stores the table data in its primary key records—a clustered index—so it means that virtual columns will not be present in InnoDB primary key records), thus decreasing the overall size of the table. This then allows for faster table scans and other large operations.
Since they are truly “virtual”, adding and dropping virtual columns does not require a table rebuild. Rather, it only requires a quick system table update that registers the new metadata. This makes the schema changes simple and fast.
Creating an index on a virtual column (only secondary indexes are allowed) will essentially “materialize” the virtual column in the index records as the computed values are then stored in the secondary index, but the values are not stored in the primary key (clustered index). So the table itself is still small, and you can quickly look up the computed (and stored) virtual column values in the secondary index.
Once an index is created on a virtual column, the value for such a virtual column is MVCC logged so as to avoid unnecessary re-computation of the generated column value later when we have to perform a rollback or purge operation. However, since its purpose is in maintaining the secondary index only, we will only log only up to a certain limited length of the data in order to save space, since our index has a key length limitation of 767 bytes for COMPACT and REDUNDANT for formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats.

Changes Since the Lab Release

There are a few noteworthy and useful changes/additions since the initial Lab release:

A single “functional index” can now be created on a combination of both virtual columns and non-virtual generated columns. That is, you can create a composite index on a mix of virtual and non-virtual generated columns.
Users can create functional indexes ONLINE using the in-place algorithm so that DML statements can still be processed while the index is being created. In order to achieve that, the virtual column values used within the concurrent DML statements are computed and logged while the index is being created, and later replayed on the functional index.
Users can create virtual columns based on other virtual columns, and then index them.
The next improvement is less visible to user, but still worth mentioning. It is about enhancements to the purge related activities on indexed virtual columns. A new callback (WL#8841) provides a server layer function that can be called by InnoDB purge threads to compute virtual column index values. Generally this computation is done from connection threads (or sessions), however, since internal InnoDB purge threads do not correspond to connections/sessions and thus don’t have THDs or access to TABLE objects, this work was necessary in order to provide a server layer callback which enables the purge threads to make the necessary computations.

Limitations

There are still some notable restrictions around the “functional indexes”, some of which will be lifted by later work:

Primary Keys cannot be added on virtual columns.
You cannot create a spatial or fulltext index on virtual columns (this limitation will eventually be lifted).
A virtual index cannot be used as a foreign key.

You cannot create virtual columns on non-repeatable/non-deterministic functions. For example:

mysql> ALTER TABLE `t` ADD p3 DATE GENERATED ALWAYS AS (curtime()) virtual;
ERROR 3102 (HY000): Expression of generated column 'p3' contains a disallowed function.

Adding and dropping of virtual columns can be done in-place or online only when done as single operations by themselves, and not when combined with other table alterations. This limit will be removed later, but you can always go around it by keeping the in-place ADD/DROP virtual columns operations contained within a separate DDL statement.

A Few More Examples

In the previous blog post we have an example on how to use a “functional index” in conjunction with some JSON functions. Users can essentially use any functions for virtual columns, except for those that are non-deterministic (such as NOW()). So let’s next walk through some additional examples using some non-JSON functions:

Indexes on XML fields

mysql> create table t(a int, b varchar(100), c varchar(100) generated always as (ExtractValue(b, '//b[1]')) virtual);
Query OK, 0 rows affected (0.22 sec)

mysql> insert into t values (1, 'XY', default);
Query OK, 1 row affected (0.05 sec)

mysql> select * from t;
+------+-------------------------+------+
| a    | b                       | c    |
+------+-------------------------+------+
|    1 | XY | X    |
+------+-------------------------+------+
1 row in set (0.00 sec)

Indexes on Geometry calculations

Here is an example showing how you can quickly add a virtual column that stores the distance (in meters) between two geographic points or (LONGITUDE, LATITUDE) coordinate pairs:

First we’ll create a table with some geography data in it:

mysql> CREATE TABLE t (
id int(11) NOT NULL,
p1 geometry DEFAULT NULL,
p2 geometry DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.20 sec)

mysql> insert into t values(1, POINT(-75.341621, 41.061987), POINT(-75.3555043, 41.0515628));
Query OK, 1 row affected (0.04 sec)

mysql> insert into t values(2, POINT(-75.341621, 41.061987), POINT(-75.3215434, 41.0595024));
Query OK, 1 row affected (0.04 sec)

Now you want to measure the distance (in meters) between the two points. You can quickly ADD a virtual column and then index it, all without having to rebuild the table:

mysql> ALTER TABLE t ADD distance double GENERATED ALWAYS AS (st_distance_sphere(p1, p2)) VIRTUAL;
Query OK, 0 rows affected (0.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> create index idx on t(distance);
Query OK, 0 rows affected (0.27 sec)
Records: 0 Duplicates: 0 Warnings: 0

Now you can query this table easily and quickly using this new virtual column and its index:

mysql> explain select distance from t\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx
      key_len: 9
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql> select distance, id from t;
+--------------------+----+
| distance           | id |
+--------------------+----+
| 1642.7497709937588 |  1 |
| 1705.8728579019303 |  2 |
+--------------------+----+
2 rows in set (0.00 sec)

String manipulation

You can also add some columns that use any of the string manipulation functions. For example:

mysql> CREATE TABLE `t` (
`a` varchar(100) DEFAULT NULL,
`b` varchar(100) DEFAULT NULL
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.20 sec)

mysql> insert into t values ("this is an experiment", "with string manipulation");
Query OK, 1 row affected (0.03 sec)

mysql> ALTER TABLE t ADD COLUMN count1 int GENERATED ALWAYS AS (char_length(a)) VIRTUAL;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t ADD COLUMN count2 int GENERATED ALWAYS AS (char_length(b)) VIRTUAL;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from t;
+-----------------------+--------------------------+--------+--------+
| a                     | b                        | count1 | count2 |
+-----------------------+--------------------------+--------+--------+
| this is an experiment | with string manipulation |     21 |     24 |
+-----------------------+--------------------------+--------+--------+
1 row in set (0.00 sec)

mysql> ALTER TABLE t ADD COLUMN count3 int GENERATED ALWAYS AS (instr(a, "exp")) VIRTUAL;
Query OK, 0 rows affected (0.07 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> select count3 from t;
+--------+
| count3 |
+--------+
|     12 |
+--------+
1 row in set (0.00 sec)

Lastly, it’s worth noting that you can also create your own custom functions (or UDFs) and use those as the basis for your virtual columns.

Some Quick Performance Benchmarks

As expected, there will be some additional write cost when using an index on a virtual column due to the necessary computation of the virtual columns when they need to be materialized (e.g. INSERT or UPDATE), in other words the costs will be associated with creating and maintaining the index. If the column does not have a functional index, however, then the cost will instead be associated with reads as the value will need to be materialized any time that the row is examined. The added cost is also directly related to the complexity of the computation functions used.

However, even with such additional costs, using virtual columns and “functional indexes” can still be far better than creating the table with such STORED generated columns as the latter materialize the data in the clustered index (primary key), thus resulting in a larger table (both on disk and in memory).

These quick tests are conducted on 3 types of table:

A table with virtual columns:

CREATE TABLE `t` (
`h` INT NOT NULL PRIMARY KEY,
`a` varchar(30),
`b` BLOB,
`v_a_b` BLOB GENERATED ALWAYS AS (CONCAT(a,b)) VIRTUAL,
`v_b` BLOB GENERATED ALWAYS AS (b) VIRTUAL,
`e` int,
`v_h_e` INT(11) GENERATED ALWAYS AS (h + e) VIRTUAL,
`v_e` INT GENERATED ALWAYS AS (e) VIRTUAL,
`v_a` INT GENERATED ALWAYS AS (char_length(a)) VIRTUAL
) ENGINE=InnoDB;

This table has 4 normal columns and 5 VIRTUAL columns. We make the computation function used very simple so as to minimize the impact from the function itself.

A “normal” table without a any generated columns at all (neither VIRTUAL or STORED):

CREATE TABLE `t_nv` (
`h` INT NOT NULL PRIMARY KEY,
`a` VARCHAR(30),
`b` BLOB,
`e` INT
) ENGINE=InnoDB;

A table with materialized or STORED generated columns:

CREATE TABLE `t_m` (
`h` INT NOT NULL PRIMARY KEY,
`a` varchar(30),
`b` BLOB,
`v_a_b` BLOB GENERATED ALWAYS AS (CONCAT(a,b)) STORED,
`v_b` BLOB GENERATED ALWAYS AS (b) STORED,
`e` int,
`v_h_e` INT(11) GENERATED ALWAYS AS (h + e) STORED,
`v_e` INT GENERATED ALWAYS AS (e) STORED,
`v_a` INT GENERATED ALWAYS AS (char_length(a)) STORED
) ENGINE=InnoDB;

We then use the following procedure to INSERT rows into each of these tables:

CREATE PROCEDURE insert_values(n1 int, n2 int)
begin
        DECLARE i INT DEFAULT 1;
        WHILE (i+n1 <= 100000+n2) DO
                INSERT INTO t VALUES (n1+i, CAST(n1+i AS CHAR), REPEAT('b', 2000), DEFAULT, DEFAULT, n1+i+10, DEFAULT, DEFAULT, DEFAULT);
                INSERT INTO t_nv VALUES (n1+i, CAST(n1+i AS CHAR),  REPEAT('b', 2000), n1+i+10);
                INSERT INTO t_m VALUES (n1+i, CAST(n1+i AS CHAR), REPEAT('b', 2000), DEFAULT, DEFAULT, n1+i+10, DEFAULT, DEFAULT, DEFAULT);
                SET i = i + 1;
        END WHILE;
END|

All tests were conducted on a 48 core x86_64 GNU/Linux machine, with a 10GB InnoDB buffer pool. All tests were run with a single thread. Each number comes from averages of over 3 runs. Here are the results:

Insertion without an index:

	Insert of 500,000 row	Insert of 1,000,000 row
Table 1. with virtual column (t)	3 min 24.65 sec	6 min 59.91 sec
Table 2. without virtual column (t_nv)	3 min 21.41 sec	6 min 31.82 sec
Table 3. with materialized column	4 min 25.58 sec	8 min 43.66 sec

So for insertion into tables without any secondary index, the time is very similar for a table without any generated columns at all and the one with VIRTUAL columns. The latter does not have those columns materialized, so the amount of data inserted is exactly the same for the first two tables. However, if the columns are materialized/stored (as in table 3), the time will take substantially longer.

One thing to note is that even though the time taken for table 1 and table 2 are very similar, table 1 insertion still takes slightly longer. This is due an issue that some unnecessary computation is still done for a table with virtual columns, this will be fixed soon.

Creating an index

Create index on similar columns for 3 different tables:

	Table with 1,000,000 row
Create index on t(v_e)	2.90 sec
Create index on t_nv(e)	2.40 sec
Create index on t_m(v_e) (Table 3. with materialized column )	3.31 sec

Create the index on table 1’s virtual column v_e is a bit more expensive than its base column in table 2, since indexing virtual column(s) requires computing the value. However, in this case, it is still faster than creating an index on the same column with a STORED generated column, simply because the table with generated columns is so much bigger, a simple scan takes more time.

A few more runs of CREATE INDEX on the table t, just to show the scale of the costs when adding indexes on a virtual column.

	Time to create index on 1,000,000 rows on table 1 (table `t` with virtual columns)
Create index on column `v_e`	2.90 sec
Create index on column `e`	2.47 sec
Create index on column `v_b(3)`	3.26 sec
Create index on column `b(3)`	2.67 sec
Create index on column `v_h_e`	2.97 sec
Create index on column `v_a`	3.06 sec
Create index on column `v_a_b(10)`	4.19 sec

As mentioned, creating an index on virtual columns are a bit more costly than creating an index on normal columns, since the computation needs to be performed on each row.

Adding a new column

ALTER TABLE ... ADD COLUMN would usually require a full table rebuild for normal or STORED generated columns. But if you add a virtual column, it is not required and thus will be almost instant.

	ALTER TABLE ADD COLUMN on table with 1million rows
alter table t_nv add column col1 int;	1 min 20.50 sec
alter table t_nv add column col2 int GENERATED ALWAYS AS (e) stored;	1 min 32.40 sec
alter table t_nv add column col3 int GENERATED ALWAYS AS (e) virtual;	0.10 sec

So if you add a virtual column and then materialize it via CREATE INDEX, it will only take a few seconds (2 to 3 sec for creating the index according to previous experiment). If you do that for a normal column or generated column, it will take 50x to 60x more time (mostly spent in rebuilding the table).

Dropping a column

Similarly, dropping a virtual column is far faster for the same reasons.

	ALTER TABLE DROP COLUMN on table with 1million rows
alter table t_nv drop column col1;	47.02 sec
alter table t_nv drop column col2; (A GENERATED column)	50.41 sec
alter table t_nv add column col3; (A virtual column)	0.10 sec

DMLs with a virtual index or “functional index”:

INSERT

Insert of 500,000 rows

Table 1 with functional index on column v_e 6 min 57.31 sec

Table 2 with index on column e 6 min 33.09 sec

Table 3 with index on column v_e 9 min 5.24 sec

As shown in this example, for a table with indexed virtual columns, ts insertion times are significantly less than table 3, which materializes the value in the clustered index (primary key).

UPDATE

The following UPDATE statements were then performed on the 3 tables, and results were:

mysql> update t set e=e+1;

	Update time on table with 1,000,000 rows
Update on table 1 with index on virtual column `v_e`	1 min 20.39 sec
Update on table 2 with index on column `e`	52.26 sec
Update on table 3 with index on materialized column `v_e`	1 min 2.52 sec

As you can see, UPDATE statements on indexed virtual columns are more expensive. This demonstrates the additional MVCC costs associated with the operation (in addition to any operation associated with column e) because 1) The old value for v_e needs to be computed (for the UNDO log) and 2) The old and new values for v_e will need to be UNDO logged.

DELETE

	Delete of 1,000,000 rows
Delete all row with index on virtual column ‘v_e’	21.52 sec
Delete all row with index on ‘e’	20.54 sec
Delete on table 3 with index on materialized column `v_e`	32.09 sec

The DELETE statements were faster on table with VIRTUAL generated columns than those for table with the STORED generated columns. The table with virtual columns is apparently much smaller than that with materialized column. So the deletion operation is much faster.

The DELETE operation will also require a little extra MVCC work if there are indexes on virtual columns because 1) The old value for v_e needs to be computed (for the UNDO log) and 2) The old and new values for v_e will need to be UNDO logged.

So the DELETE statement is a little bit more expensive than when using a regular column, but much faster than those with STORED generated columns.

SELECT Queries:

Of course, as expected, the table with STORED generated columns is much larger than the one with VIRTUAL generated columns. And this is clearly shown with a quick table scan (after the initial run to bring the data into the buffer pool)

	select count(*) from t
Table 1. with virtual column (t)	0.59 sec
Table 2. without virtual column (t_no_v)	0.60 sec
Table 3. with materialized column	1.30 sec

As shown above, a table scan on the table with STORED generated columns took 3x longer than the scan on the table with VIRTUAL generated columns.

While the table with virtual columns and indexes remains small, it still takes advantage of having a materialized (secondary) index to facilitate efficient queries:

	Query on char_length(a) on table with 1,000,000 rows
Table 1 with virtual column and index on ‘char_length(a)’	0.00 sec
Table 2 without index on ‘char_length(a)’	0.66 sec
Table 3 with stored column and index on ‘char_length(a)’	0.00 sec

Without the “functional index” on the ‘char_length(a)‘ value, table 2 requires a full table scan to get the results.

Summary

The virtual column and functional index work is now officially in 5.7! This is a great feature that allows users to ADD and DROP VIRTUAL generated columns, along with adding optional secondary indexes on those columns, all done using ONLINE operations. As shown in the simple performance study above, the data materialized in such a way keeps the base table small (as it does not have duplicate copies in the InnoDB clustered/primary index) and thus making more efficient use of persistent storage space and memory, while at the same time providing vastly improved query performance!

That’s it for now. As always, THANK YOU for using MySQL!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Testing and Verifying your MySQL Backup Strategy Presentation

August 11, 2015, 4:19 pm

≫ Next: MySQL replication in action - Part 1: GTID & Co

≪ Previous: Virtual Columns and Effective Functional Indexes in InnoDB

This past week I have been the sole MySQL representative on the Oracle Technology Network (OTN) Latin America 2015 tour events in Uruguay, Argentina, Chile and Peru.

download pdf of Testing and Verifying your MySQL Backup Strategy Presentation

Download Presentation

In this presentation I talk about the important steps for testing and verifying your MySQL backup strategy to ensure your business continuity in any disaster recovery situation. This includes:

Overview of the primary product options
Backup and recovery strategy considerations
Technical requirements
Common problems observed
What about a failover strategy

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL replication in action - Part 1: GTID & Co

August 11, 2015, 9:00 pm

≫ Next: MySQL replication primer with pt-table-checksum and pt-table-sync

≪ Previous: Testing and Verifying your MySQL Backup Strategy Presentation

In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.

When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.

These two tables allow us to see the same information previously stored in the files master.info and relay_log.info. What makes these tables convenient is that they should survive a crash better than the standalone files.

The idea is good, but the implementation could be better. The new tables are disabled by default. To use them, you need to set a couple of dynamic variables,

relay-log-info-repository=table  
master-info-repository=table

Here is an example of what these tables look like

slave1 [localhost] {msandbox} (mysql) > select * from slave_master_info\G  
*************************** 1. row ***************************  
       Number_of_lines: 23  
       Master_log_name: mysql-bin.000002  
        Master_log_pos: 151  
                  Host: 127.0.0.1  
             User_name: rsandbox  
         User_password: rsandbox  
                  Port: 21891  
         Connect_retry: 60  
           Enabled_ssl: 0  
                Ssl_ca:  
            Ssl_capath:  
              Ssl_cert:  
            Ssl_cipher:  
               Ssl_key:  
Ssl_verify_server_cert: 0  
             Heartbeat: 1800  
                  Bind:  
    Ignored_server_ids: 0  
                  Uuid: 27971ecc-36e8-11e5-b390-2ff12c09a72a  
           Retry_count: 86400  
               Ssl_crl:  
           Ssl_crlpath:  
 Enabled_auto_position: 0  
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > select * from slave_relay_log_info\G  
*************************** 1. row ***************************  
  Number_of_lines: 7  
   Relay_log_name: ./mysql_sandbox21892-relay-bin.000005  
    Relay_log_pos: 907  
  Master_log_name: mysql-bin.000002  
   Master_log_pos: 697  
        Sql_delay: 0  
Number_of_workers: 0  
               Id: 1  
1 row in set (0.00 sec)

The information looks like what we used to get from the .info files. There is, however, a notable difference. Look at what SHOW SLAVE STATUS says about the same situation:

slave1 [localhost] {msandbox} ((none)) > show slave status\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 21891  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 697  
               Relay_Log_File: mysql_sandbox21892-relay-bin.000005  
                Relay_Log_Pos: 907  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes

The value of Read_Master_Log_Pos is different. SHOW SLAVE STATUS says 697, while mysql.slave_master_info reports an older position: 151.

The reason for this discrepancy is that, by default, the table is updated every 10,000 events, while the slave_relay_log_info table is updated at every event. This means that, in case of crash, only one table is guaranteed to hold reliable information. It should be enough for a recovery, at least until someone finds a creative way of crashing the server in a way that requires the updated contents of slave_master_info.

Shortly after the crash-safe tables, a new feature was released as a preview, and later included in the main build: global transaction identifiers, or GTID. While I am glad that the feature was added, I am not pleased with the way it is implemented. Let’s see how it works.

You may have noticed in one of the listings above a field named Uuid, containing a long value: 27971ecc–36e8–11e5-b390–2ff12c09a72a. This long string of hexadecimal digits is the identifier of the server. The good thing about this identifier is that it is guaranteed to be unique. Unlike the server-id, which is a 64bit integer generated by users, this one is created during the server initialization, and you should be reasonably sure that no two servers have the same identifier. The bad thing is that these identifiers are unreadable and unpronounceable by humans. Try it:

“Hey, Sam! can you check if 27971ecc–36e8–11e5-b390–2ff12c09a72a is replicating to 30589f86–36e8–11e5-b390–0b61c3af229e?”

Pretty tough, eh? But unfortunately, this is how GTID in MySQL 5.6 and 5.7 are implemented. When they are enabled, you will see in the binary logs remarks such as this one:

SET @@SESSION.GTID_NEXT= '27971ecc-36e8-11e5-b390-2ff12c09a72a:2'/*!*/;

That’s your GTID. If you are using a simple master/slave deployment, you can just ignore the long string and concentrate on the second element (here the number “2”) which is the sequence number of the event. Things get hairier when we deal with multiple sources. We’ll see that in one of the next articles.

For now, it will suffice to notice that we will find the same string both in the master binary log and in the slaves relay log, regardless of how different the file names and positions are in the various servers. This fact allows us to easily find a specific event in any server belonging to the same replication domain, which is the main purpose of having a GTID.

Le’s have a look at a complete result from SHOW SLAVE STATUS, which now includes GTID information.

slave1 [localhost] {msandbox} ((none)) > show slave status\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 21891  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 697  
               Relay_Log_File: mysql_sandbox21892-relay-bin.000005  
                Relay_Log_Pos: 907  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 697  
              Relay_Log_Space: 1287  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: 0  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
                  Master_UUID: 27971ecc-36e8-11e5-b390-2ff12c09a72a  
             Master_Info_File: mysql.slave_master_info  
                   SQL_Delay: 0  
          SQL_Remaining_Delay: NULL  
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it  
           Master_Retry_Count: 86400  
                  Master_Bind:  
      Last_IO_Error_Timestamp:  
     Last_SQL_Error_Timestamp:  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
           Retrieved_Gtid_Set: 27971ecc-36e8-11e5-b390-2ff12c09a72a:1-2  
            Executed_Gtid_Set: 27971ecc-36e8-11e5-b390-2ff12c09a72a:1-2  
                Auto_Position: 0  
1 row in set (0.00 sec)

Almost everything up to master server ID looks the same as in previous versions. Then we get that long identifier, a note that we are using the (scarcely updated) mysql.slave_master_info table, and at the very end the information about the latest GTIDs that were processed.

What we have seen so far is enough for being upset at the GTID implementation, but there is more:

To enable GTIDs, you need log-slave-updated in all nodes involved in replication. The reason is that you want a slave to be ready to become a master and vice versa, but this imposition may be expensive. This requirement has been lifter in MySQL 5.7.
The crash-safe tables do not include GTID values. You can get GTID information from the binary logs, or from SHOW SLAVE STATUS, and in MySQL 5.7 also from a few performance_schema tables. But there is no place except SHOW SLAVE STATUS where you get at once the GTID and the corresponding binary log and position. Sure, there are tools that can do this for you, but it feels as if something is missing.
GTID with CHANGE MASTER TO is an all-or-nothing proposition. With GTID, you can either use MASTER_AUTO_POSITION=1 and let master and slave sync each other, or you get the default behavior (replicating from the earliest binlog available). There is no such a ting as "start from GTID #". If you don't want to start replication from the automatic position or from the beginning, you still need to use binary log name and position.
When you need to skip one or more transactions in the slave, the only available method is creating as many empty transactions as you want to ignore.
There are statements that are not accepted when GTID is enabled. And not accepted does not mean that they are not replicated. It means that you can’t enter these statements in the master: they will be rejected with an error:
- updates involving transactional and non transactional tables; (note that events that only affect non-transactional tables, such as MyISAM, are accepted)
- CREATE TABLE … SELECT statements.
- Temporary tables within transactions.

For example:

create table dummy select * from t2;  
ERROR 1786 (HY000): CREATE TABLE ... SELECT is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1.

More monitoring in MySQL 5.7

In MySQL 5.7, the performance_schema has acquired many new tables, some of which are dedicated to replication.

slave1 [localhost] {msandbox} (performance_schema) > show tables like 'repl%';  
+-------------------------------------------+  
| Tables_in_performance_schema (repl%)      |  
+-------------------------------------------+  
| replication_applier_configuration         |  
| replication_applier_status                |  
| replication_applier_status_by_coordinator |  
| replication_applier_status_by_worker      |  
| replication_connection_configuration      |  
| replication_connection_status             |  
| replication_group_member_stats            |  
| replication_group_members                 |  
+-------------------------------------------+  
8 rows in set (0.00 sec)

Looking at these tables, we realize that mostly they have converted into table some of the contents of SHOW SLAVE STATUS.

select * from replication_connection_configuration\G  
*************************** 1. row ***************************  
                 CHANNEL_NAME:  
                         HOST: 127.0.0.1  
                         PORT: 13052  
                         USER: rsandbox  
            NETWORK_INTERFACE:  
                AUTO_POSITION: 0  
                  SSL_ALLOWED: NO  
                  SSL_CA_FILE:  
                  SSL_CA_PATH:  
              SSL_CERTIFICATE:  
                   SSL_CIPHER:  
                      SSL_KEY:  
SSL_VERIFY_SERVER_CERTIFICATE: NO  
                 SSL_CRL_FILE:  
                 SSL_CRL_PATH:  
    CONNECTION_RETRY_INTERVAL: 60  
       CONNECTION_RETRY_COUNT: 86400  
           HEARTBEAT_INTERVAL: 30.000

select * from replication_connection_status\G  
*************************** 1. row ***************************  
             CHANNEL_NAME:  
               GROUP_NAME:  
              SOURCE_UUID: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3  
                THREAD_ID: 23  
            SERVICE_STATE: ON  
COUNT_RECEIVED_HEARTBEATS: 190  
 LAST_HEARTBEAT_TIMESTAMP: 2015-07-30 22:18:45  
 RECEIVED_TRANSACTION_SET: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-207  
        LAST_ERROR_NUMBER: 0  
       LAST_ERROR_MESSAGE:  
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

This latest table is bizarrely named. It’s named “connection status”, but it is the only table where there is running information about the replication process. It should be called with something that reminds applier progress. Anyway, here is the only point in these tables where we get information about GTID. However, we only get information about GTIDs that were “received.” There is no indication about the ones being executed. Recall what we have seen in the first article about the applier work, and compare with the information available in SHOW SLAVE STATUS and the mysql.slave_* tables: in the old info, we get the master position (original events), the position of the data in the relay logs (events transferred to the slave) and the position of the execution (events actually applied to the database.)

If you want to know the latest GTID that was executed, you need to run SELECT @@global.gtid_executed or to run SHOW SLAVE STATUS, where both pieces of information are shown together.

There is another table that often is useful for monitoring: replication_applier_status_by_coordinator will have the error code and message when replication breaks.

To complement the information in performance_schema, there is another table in the mysql database, named gtid_executed. This table is filled with the GTIDs that were executed in the server, but only when some events occur (e.g. flush logs) or when the slave does not have binary logging enabled.

What’s still missing

The replication_* tables in performance_schema have only information related to the slave operations. However, as we have seen in the previous article, we can’t use only one sided information to monitor replication. We need the master status. Which, as of today, is still only a “SHOW” command. There is a status variable that tells the status of the GTID, but that’s all.

master [localhost] {msandbox} ((none)) > show master status\G  
*************************** 1. row ***************************  
             File: mysql-bin.000002  
         Position: 681  
     Binlog_Do_DB:  
 Binlog_Ignore_DB:  
Executed_Gtid_Set: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-2  
1 row in set (0.00 sec)

master [localhost] {msandbox} ((none)) > select  @@global.gtid_executed;  
+------------------------------------------+  
| @@global.gtid_executed                   |  
+------------------------------------------+  
| 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-2 |  
+------------------------------------------+  
1 row in set (0.00 sec)

The reason I am so hot about having the monitoring information in tables instead of SHOW commands is because this makes monitoring only available through external tools. Having all information in tables would allow us to run monitoring of the whole replication in SQL, as was demonstrated in a prototype a few years ago.

MariaDB 10 GTID and crash-safe tables

Compared to what we have seen in MySQL 5.6 and 5.7, MariaDB implementation of GTID is rather minimalistic. Here are the basic facts:

GTIDs are active by default. No need to enable them. You will get them out of the box.
There are no known limitations. All commands are allowed.
The data origin is identified by a group of thee integers: the domain, the server, and the sequence. (We will see the domain in action when we examine multi-source replication.)
The slave crash-safe table is quite simple, compared to MySQL.

In the master, you can see the GTID in a variable:

master [localhost] {msandbox} (test) > select @@gtid_current_pos;  
+--------------------+  
| @@gtid_current_pos |  
+--------------------+  
| 0-1-17             |  
+--------------------+  
1 row in set (0.00 sec)

And in the slave the latest GTIDs are stored in a table.

slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;  
+-----------+--------+-----------+--------+  
| domain_id | sub_id | server_id | seq_no |  
+-----------+--------+-----------+--------+  
|         0 |     16 |         1 |     16 |  
|         0 |     17 |         1 |     17 |  
+-----------+--------+-----------+--------+  
2 rows in set (0.00 sec)

You can compare information in master and slave using global variables

master [localhost] {msandbox} (test) > show global variables like '%gtid%';  
+------------------------+--------+  
| Variable_name          | Value  |  
+------------------------+--------+  
| gtid_binlog_pos        | 0-1-17 |  
| gtid_binlog_state      | 0-1-17 |  
| gtid_current_pos       | 0-1-17 |  
| gtid_domain_id         | 0      |  
| gtid_ignore_duplicates | OFF    |  
| gtid_slave_pos         |        |  
| gtid_strict_mode       | OFF    |  
+------------------------+--------+

slave1 [localhost] {msandbox} (mysql) > show global variables like '%gtid%';  
+------------------------+--------+  
| Variable_name          | Value  |  
+------------------------+--------+  
| gtid_binlog_pos        |        |  
| gtid_binlog_state      |        |  
| gtid_current_pos       | 0-1-17 |  
| gtid_domain_id         | 0      |  
| gtid_ignore_duplicates | OFF    |  
| gtid_slave_pos         | 0-1-17 |  
| gtid_strict_mode       | OFF    |  
+------------------------+--------+

Notice that the master has more information than the slave, because it has data about its binary log, which the slave does not need to have, since log-slave-updates is not a requirement.

And what you see in the binary logs is quite straightforward, i.e. it is human-readable:

#150730 23:12:33 server id 1  end_log_pos 3595  GTID 0-1-17

(As we have noted for MySQL 5.6/5.7, also in MariaDB 10 the GTID is seen in both the master binary log and the slaves relay logs.)

There are no other tables related to replication in information_schema or performance_schema. The old information (binary log + position) is not recorded anywhere. The design decision, in this case, was to use GTID information only. It is possible to set up and manage replication with only GTIDs.

I have mixed feelings about this implementation. On one hand, it is cleaner and better integrated with the rest of the database than MySQL 5.6 solution. On the other hand, the minimalistic approach has sacrificed the completeness of information (see below for an example of this.) Furthermore, it breaks compatibility with MySQL so drastically, that two products cannot work together (well, not exactly: you can use Tungsten Replicator to replicate with mixed nodes, but that’s another story.) Non only you can’t mix MySQL and MariaDB 10 nodes in replication, but the MariaDB project cannot easily integrate many improvements introduced in the performance_schema by version 5.7.

In the trenches with GTID

In a nutshell, the main problem that GTID solves is to identify transactions in situations where there is a discrepancy between data received and applied in various slaves. To see how GTIDs can help in cases where we have a high load and slaves updated at different paces, let’s simulate slave lagging by turning off the SQL thread while we pump a few million rows inside the database.

First, in MySQL 5.7.

We will test using MySQL::Sandbox.

$ make_replication_sandbox 5.7.8  
installing and starting master  
installing slave 1  
installing slave 2  
starting slave 1  
. sandbox server started  
starting slave 2  
. sandbox server started  
initializing slave 1  
initializing slave 2  
replication directory installed in $HOME/sandboxes/rsandbox_5_7_8

We have one master and two slaves in replication. However, as we mentioned before, GTID is not enabled by default. For this reason, MySQL::Sandbox creates a file that runs the commands needed to use GTID in all nodes. $ cd $HOME/sandboxes/rsandbox_5_7_8 $ ./enable_gtid

Inside enable_gtid there is an example of a sweet point for GTID. In traditional replication, when you connected a slave, you had to indicate binary log and position, or the slave would replicate from the first position of the first binary log. With GTID, you don’t need this. You can instead say:

CHANGE MASTER TO MASTER_AUTO_POSITION=1

This tells the master that it will synchronize with the slave using GTIDs.

Let’s start our experiment. In both slaves, we run

STOP SLAVE SQL_THREAD;

And then we start inserting data. (Note: the relatively low number of GTID does not mean that we are inserting just a few hundred rows. We’re using the sample employees database which has multiple-row inserts, for a grand total of about 4.5 million rows.)

slave1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 13253  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000003  
          Read_Master_Log_Pos: 66374840  
               Relay_Log_File: mysql-relay.000002  
                Relay_Log_Pos: 1559  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: No  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 1346  
              Relay_Log_Space: 66380644  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: NULL  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
                  Master_UUID: f34639b4-3951-11e5-9fe2-b8aeed734276  
             Master_Info_File: mysql.slave_master_info  
                    SQL_Delay: 0  
          SQL_Remaining_Delay: NULL  
      Slave_SQL_Running_State:  
           Master_Retry_Count: 86400  
                  Master_Bind:  
      Last_IO_Error_Timestamp:  
     Last_SQL_Error_Timestamp:  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
           Retrieved_Gtid_Set: f34639b4-3951-11e5-9fe2-b8aeed734276:1-182  
            Executed_Gtid_Set: f34639b4-3951-11e5-9fe2-b8aeed734276:1-5  
                Auto_Position: 1  
         Replicate_Rewrite_DB:  
                 Channel_Name:  
1 row in set (0.00 sec)

Looking at SHOW SLAVE STATUS, we see that we get information about the data being transferred to the slave and what was executed. Here, we have a big gap, because the SQL_THREAD is idle. Anyway, SHOW SLAVE STATUS tells us both what the gap is in the relay logs, and the gap in the GTID (Retrieved_Gtid_Set vs. Executed_Gtid_Set.) Here it is clear what has been transferred to the slave and what has been executed. We can get the Executed_Gtid_Set by running SELECT @@global.gtid_executed.

Now, let’s look at the crash-safe tables. We have already said that the table slave_master_info will only be updated every 10,000 events, so we skip it. We hope that the other one should give us more updated info.

slave1 [localhost] {msandbox} (mysql) > select * from slave_relay_log_info\G  
*************************** 1. row ***************************  
  Number_of_lines: 7  
   Relay_log_name: ./mysql-relay.000002  
    Relay_log_pos: 1559  
  Master_log_name: mysql-bin.000002  
   Master_log_pos: 1346  
        Sql_delay: 0  
Number_of_workers: 0  
               Id: 1  
     Channel_name:  
1 row in set (0.00 sec)

Well, no. It only gives us the initial position. This table apparently monitors the SQL_THREAD, not the IO_THREAD. Finally, we have a look at the performance_schema:

slave1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G  
*************************** 1. row ***************************  
             CHANNEL_NAME:  
               GROUP_NAME:  
              SOURCE_UUID: f34639b4-3951-11e5-9fe2-b8aeed734276  
                THREAD_ID: 33  
            SERVICE_STATE: ON  
COUNT_RECEIVED_HEARTBEATS: 26  
 LAST_HEARTBEAT_TIMESTAMP: 2015-08-02 22:32:58  
 RECEIVED_TRANSACTION_SET: f34639b4-3951-11e5-9fe2-b8aeed734276:1-182  
        LAST_ERROR_NUMBER: 0  
       LAST_ERROR_MESSAGE:  
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00  
1 row in set (0.00 sec)

According to this table, we are dealing with GTID # 182, which, as we know from looking at SHOW SLAVE STATUS, is what we have received, but not applied.

Now, let’s restart the SQL_THREAD, and see what happens:

the table mysql.slave_relay_log_info is now updated with every event that gets applied. Here we see relay log and binary log advance, but not GTID.
We can also see GTID progress in isolation by checking SELECT @@global.gtid_executed.
performance_schema.replication_connection_status does not update anymore, although the SQL_THREAD now is working furiously.
SHOW SLAVE STATUS keeps giving us correct upgrades. This is the only place where we see together GTIDs and binary + relay logs.

My final take: why do we have half a dozen tables that give me bits and pieces, instead of having just one that gives us what we all need, i.e. the contents of SHOW SLAVE STATUS?

Next, in MariaDB 10.

We use the same setup used for MySQL 5.7. One big difference is that we don’t need to enable GTIDs, so our only concern is to use the proper option when starting replication:

CHANGE MASTER TO ...  master_use_gtid=current_pos

As we did with MySQL 5.7, we stop the SQL_THREAD, and pump some data.

slave1 [localhost] {msandbox} (mysql) > SHOW SLAVE STATUS\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 25030  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 168389219  
               Relay_Log_File: mysql-relay.000002  
                Relay_Log_Pos: 3002  
        Relay_Master_Log_File: mysql-bin.000001  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: No  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 2715  
              Relay_Log_Space: 168396297  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: NULL  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
                   Using_Gtid: Current_Pos  
                  Gtid_IO_Pos: 0-1-189  
1 row in set (0.00 sec)

We can see the GTID growing, and the corresponding relay log. At first sight, there is less information than MySQL 5.7. The data about GTID is only what we receive, not what we have executed.

Similarly to MySQL 5.7, the crash-safe table is idle, because it reports the information about executed GTIDs.

slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;  
+-----------+--------+-----------+--------+  
| domain_id | sub_id | server_id | seq_no |  
+-----------+--------+-----------+--------+  
|         0 |     11 |         1 |     11 |  
|         0 |     12 |         1 |     12 |  
+-----------+--------+-----------+--------+

And we can see the GTID that was executed by querying the variable select @@global.gtid_current_pos.

Summing up

The sunny side is that we have two database servers that can use GTID information. This is great news whenever you need to perform a failover and the old master has gone away. The old problem of synchronizing the remaining slaves becomes trivial. Both implementations make this task easy to automate.

On the darker side, I can only say that I was expecting more. I see lack of integration between GTID and binlog/position in the instrumentation. You can see them together only in SHOW SLAVE STATUS, while the new tables favor the one or the other but not both.

Both implementations share the decision of not producing a table with master status, which makes the job of automated monitoring just a tiny bit more difficult. My main beef about not having a master status table is that it is the last bit of information that is missing to do replication monitoring in pure SQL. Well, sort of. In both flavors you can compare the result of @@global.gtid_executed or @@gtid_current_pos, but it does not give you the precision of the monitoring that you can get using the SHOW statements. Again, the details of what we want to compare are in the previous article. The implementation of GTID lacks some of the rich information that we have when using log files and positions. Someone says we don’t need such information anymore. I disagree. Since replication still happens using binary and relay logs, having a place where GTIDs are related to their physical counterpart can help troubleshooting.

What’s next

We have now seen the main functioning of replication using the latest flavors of MySQL. With that, we are now ready to explore the brand new features, such as multi-source replication (or multi-master, as it is commonly referred to).

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL replication primer with pt-table-checksum and pt-table-sync

August 12, 2015, 12:00 am

≫ Next: Become a MySQL DBA - Webinar Series: Schema Changes for MySQL Replication & Galera Cluster

≪ Previous: MySQL replication in action - Part 1: GTID & Co

MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

It’s essential to make sure the slave servers have the same set of data as the master to ensure data is consistent within the replication stream. MySQL slave server data can drift from the master for many reasons – e.g. replication errors, accidental direct updates on slave, etc.

Here at Percona Support we highly recommend that our customers periodically run the pt-table-checksum tool to verify data consistency within replication streams. Specifically, after fixing replication errors on slave servers to ensure that the slave has identical data as its master. As you don’t want to put yourself in a situation where you need to failover to a slave server for some reason and you find different data on that slave server.

In this post, I will examine the pt-table-checksum and pt-table-sync tools usage from Percona Toolkit on different replication topologies. We often receive queries from customers about how to run these tools and I hope this post will help.

Percona Toolkit is a free collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

One of those tools is pt-table-checksum, which works by dividing table rows into chunks of rows. The size of a chunk changes dynamically during the operation to avoid overloading the server. pt-table-checksum has many safeguards including variation into chunk size to make sure queries run in a desired amount of time.

pt-table-checksum verifies chunk size by running EXPLAIN query on each chunk. It also monitors slave server’s continuously in order to make sure replicas not falls too far behind and in this case tool pauses itself to allow slave to catch up. Along with that there are many other safeguards builtin and you can find all the details in this documentation

In my first example case, I am going to run pt-table-checksum against pair of replication servers – i.e. master having only one slave in replication topology. We will run pt-table-checksum tool on master server to verify data integrity on slave and in case If differences found by pt-table-checksum tool we will sync those changes on slave server via pt-table-sync tool.

I have created a dummy table under test database and inserted 10 records on master server as below:

mysql-master> create table dummy (id int(11) not null auto_increment primary key, name char(5)) engine=innodb;
Query OK, 0 rows affected (0.08 sec)
mysql-master> insert into dummy VALUES (1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f'), (7,'g'), (8,'h'), (9,'i'), (10,'j');
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0
mysql-master> select * from dummy;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
|    4 | d    |
|    5 | e    |
|    6 | f    |
|    7 | g    |
|    8 | h    |
|    9 | i    |
|   10 | j    |
+------+------+
10 rows in set (0.00 sec)

Then I intentionally deleted a few records from the slave server to make it inconsistent with the master for the purpose of this post.

mysql-slave> delete from dummy where id>5;
Query OK, 5 rows affected (0.03 sec)
mysql-slave> select * from dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

Now, in this case the master server has 10 records on the dummy table while the slave server has only 5 records missing records from id>5 – we will run pt-table-checksum at this point on the master server to see if the pt-table-checksum tool catches those differences.

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-11T18:30:13      0      1       10       1       0   1.044 test.dummy

This needs to be executed on the master. The user and password you specify will be used to not only connect to the master but the slaves as well. You need the following privileges for the pt-table-checksum mysql user:

mysql-master> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `checksum_user`@'%' IDENTIFIED BY 'checksum_password';
mysql-master> GRANT ALL PRIVILEGES ON percona.* TO `checksum_user`@'%';

Earlier, in pt-table-checksum command, I used –replicate option which writes replication queries to mentioned table percona.checksums. Next I passed –ignore-databases option which accepts comma separated list of databases to ignore. Moreover, –create-replicate-table and —empty-replicate-table options are “Yes” by default and you can specify both options explicitly if you want to create database table different then percona.checksums.

pt-table-checksum reported 1 DIFF which is number of chunks which are different from master on one or more slaves. You can find details about tabular columns e.g. TS, ERRORS and so on on documentation of pt-table-checksum. After that, I ran next command to identify which table has difference on slave.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Differences on slave
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -5 1

In this command I used –replicate-check-only option which only reports the tables with having differences vice versa only checksum differences on detected replicas are printed. It doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

You may also login to the slave and also execute below query to find out which tables have inconsistencies.

mysql-slave> SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM percona.checksums
WHERE (
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;

pt-table-checksum identified test.dummy table is different on slave now we are going to use pt-table-sync tool to synchronize table data between MySQL servers.

[root@slave]# pt-table-sync --print --replicate=percona.checksums --sync-to-master h=localhost,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('6', 'f') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('7', 'g') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('8', 'h') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('9', 'i') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('10', 'j') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;

I ran the pt-table-sync tool from an opposite host this time i.e. from the slave as I used the –sync-to-master option which treats DSN as slave and syncs to master. Again, pt-table-sync will use the mysql username and password you specify to connect to the slave as well as to its master. –replicate option here examines the specified table to find out the data differences and –print just prints the SQL (REPLACE queries) not actually executes it.

You may audit the queries before executing to sync data between master/slave. You may see it printed only missing records on the slave. Once you are happy with the results, you can substitute –print with –execute to do actual synchronization.

As a reminder, these queries always executed on the master as this is the only safe way to do the changes on slave. However, on the master it’s no-op changes as these records already exists on master but then falls to slave via replication stream to sync it with master.

If you find lots of differences on your slave server it may lag during synchronization of those changes. As I mentioned earlier, you can use –print option to go through your queries which are going to be executed to sync slave with master server. I found this post useful if you see a huge difference in the table between master/slave(s).

Note, you may use the –dry-run option initially which only analyzes print information about the sync algorithm and then exits. It shows verbose output; it doesn’t do any changes though. –dry-run parameter will basically instruct pt-table-sync to not actually do the sync, but just perform some checks.

Let me present another replication topology, where the master has two slaves where slave2 is running on non-default port 3307 while master and slave1 running on port 3306. Further, slave2 is out of sync with master and I will show you how to sync slave2 which running on port 3307 with master.

mysql-master> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
+----+------+

Let’s run pt-table-checksum tool on master database server.

[root@master]# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-23T13:57:39      0      0        2       1       0   0.310 percona.dsns
07-23T13:57:39      0      1        5       1       0   0.036 test.dummy

I used –recursion-method parameter this time which is method to use find slaves in replication stream and it’s pretty useful when your servers run on non-standard port i.e. other than 3306. I created dsns table under percona database with following entries. You may find dsns table structure in documentation.

mysql> SELECT * FROM dsns;
+----+-----------+------------------------------------------------------------+
| id | parent_id | dsn                                                        |
+----+-----------+------------------------------------------------------------+
|  1 |         1 | h=192.168.0.134,u=checksum_user,p=checksum_password,P=3306 |
|  2 |         2 | h=192.168.0.132,u=checksum_user,p=checksum_password,P=3307 |
+----+-----------+------------------------------------------------------------+

Next I ran below pt-table-checksum command to identify which slave server has differences on test.dummy table.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

This shows that slave2 has different data on test.dummy table as compared to the master. Now let’s run pt-table-sync tool to sync those differences and make slave2 identical as the master.

[root@slave2] ./pt-table-sync --print --replicate=percona.checksums --sync-to-master h=192.168.0.132,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;

It shows 2 rows are different on slave2. Substituting –print with –execute synchronized the differences on slave2 and re-running pt-table-checksum tool shows no more differences.

Conclusion:
pt-table-checksum and pt-table-sync are the finest tools from Percona Toolkit to validate data between master/slave(s). With the help of these tools you can easily identify data drifts and fix them. I mentioned a couple of replication topologies above about how to check replication consistency and how to fix it in case of data drift. You may script pt-table-checksum / pt-table-sync steps and cron checksum script to periodically check the data consistency within replication stream.

This procedure is only safe for a single level master-slave(s) hierarchy. I will discuss the procedure for other topologies in future posts – i.e. I will describe more complex scenarios on how to use these tools in chain replication i.e. master -> slave1 -> slave2 pair and in Percona XtraDB Cluster setup.

The post MySQL replication primer with pt-table-checksum and pt-table-sync appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Become a MySQL DBA - Webinar Series: Schema Changes for MySQL Replication & Galera Cluster

August 12, 2015, 4:04 am

≫ Next: In Case You Missed It - Breaking Databases - Keeping your Ruby on Rails ORM under Control

≪ Previous: MySQL replication primer with pt-table-checksum and pt-table-sync

With the rise of agile development methodologies, more and more systems and applications are built in series of iterations. This is true for the database schema as well, as it has to evolve together with the application. Unfortunately, schema changes and databases do not play well together. Changes usually require plenty of advance scheduling, and can be disruptive to your operations.

In this new webinar, we will discuss how to implement schema changes in the least impacting way to your operations and ensure availability of your database. We will also cover some real-life examples and discuss how to handle them.

DATE & TIME

Europe/MEA/APAC
Tuesday, August 25th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)
Register Now

North America/LatAm
Tuesday, August 25th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

AGENDA

Different methods to perform schema changes on MySQL Replication and Galera
- rolling schema change
- online alters
- external tools, e.g., pt-online-schema-change
Differences between MySQL 5.5 and 5.6
Differences between MySQL Replication vs Galera
Example real-life scenarios with MySQL Replication and Galera setups

SPEAKER

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

We look forward to “seeing” you there and to insightful discussions!

Blog category:

Events

Tags:

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

In Case You Missed It - Breaking Databases - Keeping your Ruby on Rails ORM under Control

August 11, 2015, 5:00 pm

≫ Next: MySQL Cluster Manager 1.3.6 released

≪ Previous: Become a MySQL DBA - Webinar Series: Schema Changes for MySQL Replication & Galera Cluster

Object-relational mapping is common in most modern web frameworks such as Ruby on Rails. For the developer APIs, the ORM provides simplified interaction with the database and a productivity boost. However, the layer of abstraction the ORM provides can hide how the database is being queried. If you’re not paying attention, these generated queries can have a negative effect on your database’s health and performance.

broken computer

In this webinar, Owen Zanzal discussed ways common Rails ORMs can abuse various databases and how VividCortex can discover them. Themes covered include N+1 Queries, Missing Indexes, and Caching.

If you did not have a chance to join the webinar live, you can register for a recording here.

Pic Cred

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL Cluster Manager 1.3.6 released

August 13, 2015, 2:45 am

≫ Next: MySQL Package Verification: Making sure we always ship correct, complete and installable packages

≪ Previous: In Case You Missed It - Breaking Databases - Keeping your Ruby on Rails ORM under Control

MySQL Cluster Manager 1.3.6 is available for download from My Oracle Support.

More details are available in the the MCM 1.3.6 Release Notes.

Updated documentation is available here.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL Package Verification: Making sure we always ship correct, complete and installable packages

August 13, 2015, 5:51 am

≫ Next: Brainiac Corner with Camille Fournier

≪ Previous: MySQL Cluster Manager 1.3.6 released

What is MySQL Package Verification? Package verification (Pkgver for short) refers to black box testing of MySQL packages across all supported platforms and across different MySQL versions. In Pkgver, packages are tested in order to ensure that the basic user experience is as it should be, focusing on installation, initial startup and rudimentary functionality. When […]
PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Brainiac Corner with Camille Fournier

August 12, 2015, 5:00 pm

≫ Next: Races in the TokuFT Race Detector

≪ Previous: MySQL Package Verification: Making sure we always ship correct, complete and installable packages

The Brainiac Corner is a format where we talk with some of the smartest minds in the system, database, devops, and IT world. If you have opinions on pirates, or anything else related, please don’t hesitate to contact us.

camillefournier

Today we interview Camille Fournier, the current the CTO at Rent the Runway. Follow her on twitter @skamille.

How did you get from stork to brainiac (i.e. what do you do today and how did you get there)?

I’m currently the CTO of Rent the Runway, a company that rents designer dresses and accessories. My journey into tech is a familiar one; enjoyed computers as a kid, decided that computer science would be a smart area to go into based on the growth of personal computing in the 80s and early 90s, and been happy with it ever since. I ended up at Rent the Runway after a long period at Goldman Sachs doing various software engineering for internal distributed systems. I came to Rent the Runway because I wanted a change, I wanted to try out the startup world and I wanted the opportunity to get into more of a leadership role, and of course I thought that the business had huge potential to change the fashion world. 4 years and 4X team growth later, all of that has happened, and it’s been a rollercoaster and an amazing learning experience.

What is in your group’s technology stack?

We’re relatively conservative. Java (micro)services, MySQL, MongoDB, Redis, RabbitMQ, Ruby that does not touch the database directly, Memcache, JavaScript (Backbone+React), and of course the Objective-C/Swift stuff for our app. We also have Vertica, Scala, and Python for our data processing layer.

Who would win in a fight between ninjas and pirates? Why?

Probably Ninjas unless it’s a ship of Ninjas vs a ship of Pirates, in which case I’m going to bet on the Pirates.

Which is a more accurate state of the world, #monitoringsucks or #monitoringlove?

We’re trending towards #monitoringlove but not there yet. I think people want magical monitoring tools that will eliminate their need to think, and that will never happen, but at least we’re able to get more useful insights now than we have in the recent past.

In six words or less, what are the biggest challenges your organization faces?

Move fast, don’t break too much.

What’s the best piece of advice you’ve ever received?

When you go after what you want, you get what’s in the way.

What principles guide your expertise in your given domain?

In the domain of technical architecture: don’t overbuild too early, write unit tests, think about the nature of the data and functionality you’re working with and scale on the axis that make sense for the evolution of your business.

In the domain of management and leadership: Be brave, be kind, sometimes the kindest thing is to be brave and tell people the hard truth, remember that people are fellow human beings and everyone is living their own story so try not to overlay your own story on top of them.

In both: Spend the time to get really clear with yourself about what you want, write it down, say it a few different ways. A narrative is needed both for leading people and for leading technology, and the clearer your narrative is the faster you can move and the better the outcome will be.

What is your vision for system administration in 5 years?

It will still exist, and more companies will realize the value of hiring people who actually have expertise in that area. But the people with expertise will also learn that they really do have to meet the rest of the team at least halfway, or the developers will just do a bad job without their input.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Races in the TokuFT Race Detector

August 13, 2015, 7:04 am

≫ Next: MySQL Connector/NET 6.7.8, 6.8.6, and 6.9.7 have been released

≪ Previous: Brainiac Corner with Camille Fournier

TokuFT (now called PerconaFT) is the write optimized storage component used by TokuDB for MySQL and TokuMX for MongoDB. Since TokuFT is its own component, TokuFT can be tested independently of TokuDB and TokuMX. Some of the TokuFT tests use valgrind's memcheck, helgrind, and DRD tools to identify bugs.

Helgrind and DRD find data races in multi-threaded programs at runtime rather than at compile time. In my experience with these tools, data races are not predictable and it sometimes takes many test runs of the same program to find a problem. So, I ran the group commit count test in a loop with helgrind until it failed.

One of the data races that helgrind reports is within TokuFT's lock manager. The purpose of the lock manager is to allow concurrent transactions to execute correctly. Since a data race in the lock manager could break concurrent transactions, the problem should be investigated and resolved.

The lock manager data race reported by helgrind implies that two concurrent threads are updating the transaction identifier for the single transaction optimization (the 'sto_txnid'). A review of the code shows that this is impossible since the writes of the transaction identifier ('sto_txnid') in the 'sto_begin' and 'sto_end' functions occur when holding a mutex on the lock tree. The helgrind report shows that the same mutex is being held when the data race is reported. So, what is going on?

It turns out that there is a code path in the lock manager that does an unlocked racy read of the 'sto_txnid' that occurs when acquired locks are released. Helgrind would normally report a read race on an unlocked racy read. However, the TokuFT developers decided that a racy read is appropriate to presumably improve performance, so they attempted to inform helgrind and DRD to ignore the read race in this function. As we will see, the method used to ignore the read race causes helgrind to report an erroneous write data race, which caused me to waste a lot of time debugging this problem.

Hopefully, the following sequence of tests will demonstrate the problem without any knowledge of or code from TokuFT.

Spooky races

The spooky race program demonstrates how helgrind reports a false write data race on a variable that is clearly protected by a mutex. The false write data race is reported because of a misuse of the helgrind disable and enable checking API. TokuFT is trying to inform helgrind to ignore a variable when doing a racy read. Helgrind's enable checking API causes helgrind to view the variable as completely new, which causes helgrind to think that there is a race. In addition, helgrind does not implement the ignore reads annotation, so that method can not be used to temporarily disable data race checking.

Luckily, the ignore reads annotation works nicely with DRD, so the spooky race test does not fail with DRD.

The false write data race program demonstrates how a misuse of helgrind's disable and enable API can cause helgrind to generate a false write data race. Helgrind documented the APIs; the TokuFT developers misused them.

The false read data race program demonstrates an additional helgrind problem. Helgrind's disable and enable checking APIs are not multi-thread safe, so one can generate a read data race when two reader threads are interleaved. Again, helgrind documented the APIs; the TokuFT developers misused them.

Conclusions

TokuFT has several instances of racy reads. These racy reads are annotated by the TokuFT developers to be ignored by the race detector. Unfortunately, the helgrind disable enable checking API can not be used because it causes new false data race reports. In addition, helgrind does not implement the ignore reads API. The only way (as far as I know) to use helgrind with TokuFT is to start collecting a huge number of helgrind suppressions. This is problematic.

Luckily, DRD works fine with TokuFT since DRD implements the ignore reads API. DRD is the preferred way to test TokuFT software.

What about using the thread sanitizer which is now included in clang and gcc? We need to either remove all of the racy reads, or generate TSAN suppressions for them. Experiments in progress.

Tools

gcc 4.8
valgrind 3.10.1

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL Connector/NET 6.7.8, 6.8.6, and 6.9.7 have been released

August 13, 2015, 7:31 am

≫ Next: MySQL Quality Assurance: A Vision for the Future by Roel Van de Paar (Final Episode 13)

≪ Previous: Races in the TokuFT Race Detector

Dear MySQL users,

MySQL Connector/Net 6.7.8, 6.8.6, and 6.9.7 are maintenance releases for the series of the .NET driver for MySQL. They can be used for production environments.

They are appropriate for use with MySQL server versions 5.5-5.7.

They are now available in source and binary form from http://dev.mysql.com/downloads/connector/net/
(note that not all mirror sites may be up to date at this point-if you can’t find this version on some mirror, please try again later or choose another download site.)

Changes in MySQL Connector/Net 6.8.8

Connections to MySQL server 5.7 now default to using SSL.

Changes in MySQL Connector/Net 6.8.6

Connections to MySQL server 5.7 now default to using SSL.

Changes in MySQL Connector/Net 6.9.7

The selection of a master or slave now takes into account
both the status and mode, when before it only used the
mode. Ignoring the status was problematic as, for
example, an unreachable server’s status is marked as
FAULTY while the mode does not change. (Bug #21203824)
Using MySqlConnection.Open() with Connector/Net 6.9.6
would fail and raise the error “Unable to connect to
Fabric server”. (Bug #20983968)
Connections to MySQL server 5.7 now default to using SSL.

The documentation is available at:
http://dev.mysql.com/doc/connector-net/en/

Nuget packages are available at:
https://www.nuget.org/packages/MySql.Data/
https://www.nuget.org/packages/MySql.Data.Entity/
https://www.nuget.org/packages/MySql.Fabric/
https://www.nuget.org/packages/MySql.Web/

Enjoy and thanks for the support!

On behalf of the MySQL Connector/NET Team.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL Quality Assurance: A Vision for the Future by Roel Van de Paar (Final Episode 13)

August 13, 2015, 10:13 am

≫ Next: Percona Live Amsterdam: Community Dinner, Sep. 22nd

≪ Previous: MySQL Connector/NET 6.7.8, 6.8.6, and 6.9.7 have been released

Welcome to the final – but most important – episode in the MySQL QA Series.

In it, I present my vision for all MySQL Quality Assurance – for all distributions – worldwide.

Episode 13: A Better Approach to all MySQL Regression, Stress & Feature Testing: Random Coverage Testing & SQL Interleaving

1. pquery Review
2. Random Coverage Testing
3. SQL Interleaving
4. The past & the future

Presented by Roel Van de Paar. Full-screen viewing @ 720p resolution recommended

The post MySQL Quality Assurance: A Vision for the Future by Roel Van de Paar (Final Episode 13) appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Percona Live Amsterdam: Community Dinner, Sep. 22nd

August 13, 2015, 10:21 am

≫ Next: The language of compression

≪ Previous: MySQL Quality Assurance: A Vision for the Future by Roel Van de Paar (Final Episode 13)

Keeping up with tradition, there will be a community event held at the upcoming Percona Live Europe: Amsterdam 2015 conference.

This year, Booking.com will be hosting the event at the company's headquarters in the heart of Amsterdam.

We will hold a community dinner (dish selection, includes vegetarian; beverages will be served) in our caffeteria and hope to add some spicy activities to the event!

Space is limited, and tickets can be purchased via Eventbrite.

Special thanks to Daniël van Eeden and Jean-François Gagné for their work in making this happen!

Tuesday, September 22, 2015 from 6:30 PM to 10:00 PM (CEST)
Herengracht 597, 1017 CE
Amsterdam

Location: https://goo.gl/maps/06oOA

Walking route from conference venue: https://goo.gl/maps/Ocptu

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The language of compression

August 13, 2015, 12:19 pm

≫ Next: MySQL replication in action - Part 2 - Fan-in topology

≪ Previous: Percona Live Amsterdam: Community Dinner, Sep. 22nd

Leif Walsh will talk about the language of compression at Percona Live Amsterdam

Storage. Everyone needs it. Whether your data is in MySQL, a NoSQL, or somewhere in the cloud, with ever-growing data volumes – along with the need for SSDs to cut latency and replication to provide insurance – an organization’s storage footprint is an important place to look for savings. That’s where compression comes in (squeeze!) to save disk space.

Two Sigma software engineer Leif Walsh speaks the language of compression. Fluently. In fact, he’ll be speaking on

that exact subject September 22 during the Percona Live conference in Amsterdam.

I asked him about his talk, and about Amsterdam, the other day. Here’s what he had to say.

* * *

Tom: Hi Leif, how will your talk help IT decision-makers cut through the marketing mumbo-jumbo on what’s important to focus on and what is not
Leif: My talk will have three lessons aimed at those making storage decisions for their company:

What are the key factors to consider when evaluating storage options, and how can they affect your bottom line? This is not only how storage tech influences your hardware, operations, and management costs, but also how it can facilitate new development initiatives and cut time-to-market for your products.
How should you read benchmarks and marketing materials about storage technology? You’ll learn what to look for in promotional material, and how to think critically about whether that material is applicable to your business needs.
What’s the most effective way to communicate with storage vendors about your application’s requirements? A lot of time can be spent in the early stages of a relationship in finding a common language for users and vendors to have meaningful discussions about users’ needs and vendors’ capacity to meet those needs. With the tools you’ll learn in my talk, you’ll be able to accelerate quickly to the high-bandwidth conversations you need to have in order to make the right decision, and consequently, you’ll be empowered to evaluate more choices to find the best one faster.

Tom: In addition to IT decision-makers, who else should attend your session and what will they take away afterward?
Leif: My talk is primarily about the language that everyone in the storage community should be using to communicate. Therefore, storage vendors should attend to get ideas for how to express their benchmarks and their system’s properties more effectively, and application developers and operations people will learn strategies for getting better support and for making a convincing case to the decision makers in their own company.

Tom: Which session(s) are you most looking forward to besides your own?
Leif: Sam Kottler is a good friend and an intensely experienced systems engineer with a dynamic and boisterous personality, so I can’t wait to hear more about his experiences with Linux tuning.

As one of the original developers of TokuMX, I’ll absolutely have to check out Stephane’s talk about it, but I promise not to heckle. Charity Majors is always hilarious and has great experiences and insights to share, so I’ll definitely check out her talk too.

The post The language of compression appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL replication in action - Part 2 - Fan-in topology

August 13, 2015, 9:00 pm

≫ Next: Optimizing Conservative In-order Parallel Replication with MariaDB 10.0

≪ Previous: The language of compression

Introduction: where we stand

In the latest releases of MySQL and MariaDB we have seen several replication improvements. One of the most exciting additions is the ability to enhance basic replication with multiple sources. Those who have used replication for a while should remember that one of the tenets of the “old” replication was that a slave couldn’t have more than one master. This was The Law and there was no escape ... until now. The only way to work around that prohibition was to use circular replication, also known as ring replication, where each node is slave of the previous node and master of the next one.

Figure 1: Circular replication

This topology can work, but it is quite fragile: if one node breaks, the replication flow is also broken. Every change down the chain will continue up to the broken node, but it does not reach the nodes up the chain.

Circular reference with broken node
Figure 2: Circular replication with a broken node.

Of course you can fix a broken circular replication deployment, but it is not easy, and has several tricky points that make this task one of the least liked by DBAs.

Despite this limitation, circular replication has been used in production sometimes, mostly because there was no alternative (well, there is Tungsten Replicator, now part of VMWare Continuent, but not everybody was ready to embrace an external replicator), and because users were trying to solve the HQ-branch problem, also known as the fan-in topology.

What’s a fan-in topology

In a regular master-slave topology, we have one master and one or more slaves. This setup is useful for many scenarios: reducing load of a database backing a web server (load balancing), providing the basis for a rapid replacement of a failed master (disaster recovery), spreading the data to more users than one server could bear, and more.

Figure 3: master-slave topology.

One thing that regular replication cannot do is getting data from many input points. The best example is the headquarters of a company, where users need to have data in real time from various branches.

Figure 4: fan-in topology.

You have a visual resemblance between regular replication and fan-in replication. One is the mirror image of the other. While in regular replication the data is produced in one node (master) and conveyed to many consumers (slaves), in a fan-in topology we have many producers (masters) and one consumer (slave). Things are not always this simple. We can have various degrees of fan-in, where many masters replicate to one or more slaves. We could actually have more slaves than masters, if we want. But the main criteria that defines this topology is to have more than one master for each slave.

Figure 5: enhanced fan-in topology.

We will revisit the enhanced fan-in topology soon, as this is the basis for a more complex deployment.

In practice: how to set up a fan-in topology in MySQL 5.7.

In a nutshell, MySQL 5.7 defines syntax enhancements for the existing replication commands, which now allow a channel clause. For example, the command CHANGE MASTER TO now can have one additional clause that identifies the source.

CHANGE MASTER TO MASTER_HOST='logistics.local' ... FOR CHANNEL 'logistics';  
CHANGE MASTER TO MASTER_HOST='employees.local' ... FOR CHANNEL 'employees';

And that’s basically it. You define the same options that you would use to set up replication to a single master, but you add a channel definition for each master, and run the command once for each source.

Similarly, you can start a single channel …

START SLAVE FOR CHANNEL 'logistics';

or all of them at once:

START SLAVE;

However, you can’t start multi-source replication out of the box. You need to enable table-based repositories for monitoring. You can do that dynamically, as in the instructions below, but it’s better to add these options to the configuration file.

SET GLOBAL master_info_repository = 'TABLE';  
SET GLOBAL relay_log_info_repository = 'TABLE';

There is no requirement for GTIDs to use multi-source replication, but the manual says you should be better off if you do. Thus, your configuration file, in all nodes involved in this topology, should have at least the following:

[mysqld]  
...  
master-info-repository=table  
relay-log-info-repository=table  
gtid_mode=ON  
enforce-gtid-consistency  
server-id=XXX  
log-bin=mysql-bin  
relay-log=relay-log  
relay-index=relay-log

Let’s suppose we have four servers: host1, host2, host3, and host4. We want to set host1, host2, and host3 as masters, and host4 as the fan-in slave. Once we have set up the recommended options, we can connect to host4 and issue these commands:

CHANGE MASTER TO master_host='host1', master_port=3306, master_user='slave_user',  
    master_password='slavepass', master_auto_position=1  
    FOR CHANNEL 'NewYork';  
CHANGE MASTER TO master_host='host2', master_port=3306, master_user='slave_user',  
    master_password='slavepass', master_auto_position=1  
    FOR CHANNEL 'London';  
CHANGE MASTER TO master_host='host3', master_port=3306, master_user='slave_user',  
    master_password='slavepass', master_auto_position=1  
    FOR CHANNEL 'Paris';  
START SLAVE for channel 'NewYork';  
START SLAVE for channel 'London';
START SLAVE for channel 'Paris';

If you want to try multi-source topologies without having multiple servers at your disposal, you can use MySQL::Sandbox with several sample scripts on GitHub, as we will see a few paragraphs below .

In practice: a fan-in topology in MariaDB 10

The syntax enhancement is similar to MySQL 5.7, but, sadly, it requires a different wording. If you need to run multi-source topologies with both MySQL 5.7 and MariaDB 10, you will have to prepare two sets of commands.
In CHANGE MASTER TO, there is no new keyword, but a master name can be added after CHANGE MASTER:

CHANGE MASTER 'logistics' TO MASTER_HOST='logistics.local' ... ;  
CHANGE MASTER 'employees' TO MASTER_HOST='employees.local' ... ;

and similarly, the other replication commands

START SLAVE 'logistics';  
STOP SLAVE 'logistics';

Compared to MySQL 5.7 implementation, this one sounds wrong, or at least funny, as we first say “change master logistics” and then “start slave logistics.” Nitpicks. The good thing is that it works just as well.

You need, however, to be careful with MariaDB. Unlike MySQL 5.7, where the domain is defined implicitly by the server identifier, in MariaDB you must define explicitly a domain ID for each data source. In our case, we would run this command in each master:

set global gtid_domain_id=xxx;

Where 'xxx' is a unique integer. As said before about setting crash-safe tables, this option should also go inside the configuration file in each master.

Intermezzo: spoiling multi-source

With all its simplicity, multi-source deployment can get out of hand surprisingly quickly. You need to be aware of one important detail: in both MariaDB 10.x and MySQL 5.7 there is a hidden channel named "" (= empty string).

This means that you can mix up, inadvertently or willingly, "old" replication and multi-source replication in the same slave. For example, in a regular master/slave topology in MySQL 5.7 replication, you can go to a slave and run these commands:


CHANGE MASTER TO master_host='host1', master_port=3306, master_user='slave_user',  
    master_password='slavepass', master_auto_position=1  
    FOR CHANNEL 'NewYork';  
START SLAVE for channel 'NewYork';

You can't do the opposite, i.e. in a slave that was defined with the "for channel" clause, you can't run an old fashioned CHANGE MASTER TO:

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=5708, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_AUTO_POSITION=1;
ERROR 3079 (HY000): Multiple channels exist on the slave. Please provide channel name as an argument.

However, you can do what the system wants to prevent by using the empty string channel explicitly

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=5708, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_AUTO_POSITION=1 for channel '';
Query OK, 0 rows affected, 2 warnings (0.08 sec)

In MariaDB, you can do both things, i.e. adding a empty name channel or running a straight old fashioned CHANGE MASTER TO command, and both will be accepted:

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=10020, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_USE_GTID=current_pos;
# or
CHANGE MASTER '' TO MASTER_HOST='127.0.0.1', MASTER_PORT=10020, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_USE_GTID=current_pos;

Even if you can do it without errors (MySQL 5.7 will eagerly throw a few warnings, but that's it) don't do it. As you will see in the next session, there are enough reasons for head scratching in multi-source monitoring without introducing difficulties of your own.

In the trenches with multi-source

To see what kind of monitoring we can get with multi source, we will use the sample scripts from GitHub in coordination with MySQL Sandbox.

To use these examples, you will need the following:

MySQL Sandbox 3.0.66 installed in your server;
A binary tarball of MariaDB 10.0.20, expanded into a directory named $HOME/opt/mysql/ma10.0.20
A binary tarball of MySQL 5.7.8, expanded into a directory named $HOME/opt/mysql/5.7.8
The above mentioned MySQL replication scripts from GitHub.

Fan-in with MySQL 5.7

When all the components are in place, we can start the script multi_source.sh, which has a simple syntax:

$ ./multi_source.sh
VERSION, FLAVOR, and TOPOLOGY  required
Where VERSION is an indentifier like 5.7.7 or ma10.0.20
      FLAVOR is either mysql or mariadb
      TOPOLOGY is either FAN-IN or ALL-MASTERS

We choose to install first MySQL 5.7 FAN-IN topology:


$ ./multi_source.sh 5.7.8 mysql FAN-IN
installing node 1
installing node 2
installing node 3
installing node 4
group directory installed in $HOME/sandboxes/multi_msb_5_7_8
# server: 1:
# server: 2:
# server: 3:
# server: 4:
# option 'master-info-repository=table' added to node1 configuration file
# option 'relay-log-info-repository=table' added to node1 configuration file
# option 'gtid_mode=ON' added to node1 configuration file
# option 'enforce-gtid-consistency' added to node1 configuration file
# option 'master-info-repository=table' added to node2 configuration file
# option 'relay-log-info-repository=table' added to node2 configuration file
# option 'gtid_mode=ON' added to node2 configuration file
# option 'enforce-gtid-consistency' added to node2 configuration file
# option 'master-info-repository=table' added to node3 configuration file
# option 'relay-log-info-repository=table' added to node3 configuration file
# option 'gtid_mode=ON' added to node3 configuration file
# option 'enforce-gtid-consistency' added to node3 configuration file
# option 'master-info-repository=table' added to node4 configuration file
# option 'relay-log-info-repository=table' added to node4 configuration file
# option 'gtid_mode=ON' added to node4 configuration file
# option 'enforce-gtid-consistency' added to node4 configuration file
# executing "stop" on $HOME/sandboxes/multi_msb_5_7_8
executing "stop" on node 1
executing "stop" on node 2
executing "stop" on node 3
executing "stop" on node 4
# executing "start" on $HOME/sandboxes/multi_msb_5_7_8
executing "start" on node 1
. sandbox server started
executing "start" on node 2
. sandbox server started
executing "start" on node 3
. sandbox server started
executing "start" on node 4
. sandbox server started
# Setting topology FAN-IN
--------------
CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=8379, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_AUTO_POSITION=1 for channel 'node1'
START SLAVE for channel  'node1'
--------------
CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=8380, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_AUTO_POSITION=1 for channel 'node2'
START SLAVE for channel  'node2'
--------------
CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=8381, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_AUTO_POSITION=1 for channel 'node3'
START SLAVE for channel  'node3'
--------------

$HOME/git/mysql-replication-samples/test_multi_source_replication.sh -> $HOME/sandboxes/multi_msb_5_7_8/test_multi_source_replication.sh

The script launches MySQL Sandbox, which installs 4 nodes, and then restarts them with GTID and the other prerequisites for multi-source. Later, it configures node #4 to be the slave of the first 3 nodes. Finally, it copies a test script into the sandbox directory, so that we can see if the replication flow (from each master to the fan-in slave) works as expected.

Let's start with that:

$ cd $HOME/sandboxes/multi_msb_5_7_8
$ $ ./test_multi_source_replication.sh
# Tables in server 101
test_node1
# Tables in server 102
test_node2
# Tables in server 103
test_node3
# Tables in fan-in slave
test_node1
test_node2
test_node3
+-----------+--------+--------------+
| server_id | @@port | node         |
+-----------+--------+--------------+
|       104 |   8382 | fan-in slave |
+-----------+--------+--------------+
+----+----------+--------+-------+---------------------+
| id | serverid | dbport | node  | ts                  |
+----+----------+--------+-------+---------------------+
|  1 |      101 |   8379 | node1 | 2015-08-11 18:17:44 |
+----+----------+--------+-------+---------------------+
+----+----------+--------+-------+---------------------+
| id | serverid | dbport | node  | ts                  |
+----+----------+--------+-------+---------------------+
|  1 |      102 |   8380 | node2 | 2015-08-11 18:17:44 |
+----+----------+--------+-------+---------------------+
+----+----------+--------+-------+---------------------+
| id | serverid | dbport | node  | ts                  |
+----+----------+--------+-------+---------------------+
|  1 |      103 |   8381 | node3 | 2015-08-11 18:17:44 |
+----+----------+--------+-------+---------------------+

The test script creates a table in each of the masters, and inserts one row in each one. After a few seconds, it attempts to retrieve data from the three tables in the fan-in slave. The output shows that replication is working. (This is the "sentinel method" described in MySQL Replication Monitoring 101).

Now we have something to look at in our system. But to make things more interesting, let's introduce more data in one of the masters (we'll load the Sakila database), and create a few transactions in the slave itself. Then we'll start by looking at GTIDs:

$ for N in 1 2 3 4 ; do ./n$N -e 'select @@server_id; select @@global.gtid_executed\G'; done
+-------------+
| @@server_id |
+-------------+
|         101 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_executed: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3

+-------------+
| @@server_id |
+-------------+
|         102 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_executed: 1f5bc560-4042-11e5-90a6-d011e342a05a:1-197

+-------------+
| @@server_id |
+-------------+
|         103 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_executed: 2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3

+-------------+
| @@server_id |
+-------------+
|         104 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_executed: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3,
1f5bc560-4042-11e5-90a6-d011e342a05a:1-197,
2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3,
27089360-4042-11e5-8b8b-51136eee5e0b:1-2

Looking at the result in node #4 (server_id 104) we see that we have four series of GTIDs: one each from the masters, and one from the slave itself. Getting to know which is which can be tricky, as I warned in my previous article, where we examined the GTID in general.

Anyway, we can get more details by looking at the monitoring facilities:

node4 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 8379
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 891
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 1104
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 891
              Relay_Log_Space: 1313
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
                  Master_UUID: 1ade9710-4042-11e5-9b1c-ee8cf1128871
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3
            Executed_Gtid_Set: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3,
1f5bc560-4042-11e5-90a6-d011e342a05a:1-197,
2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3,
27089360-4042-11e5-8b8b-51136eee5e0b:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 8380
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1369702
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 1369915
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1369702
              Relay_Log_Space: 1370124
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 102
                  Master_UUID: 1f5bc560-4042-11e5-90a6-d011e342a05a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 1f5bc560-4042-11e5-90a6-d011e342a05a:1-197
            Executed_Gtid_Set: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3,
1f5bc560-4042-11e5-90a6-d011e342a05a:1-197,
2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3,
27089360-4042-11e5-8b8b-51136eee5e0b:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
*************************** 3. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 8381
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 891
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1104
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 891
              Relay_Log_Space: 1313
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 103
                  Master_UUID: 2329e9ce-4042-11e5-ae8d-96290ab7793a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3
            Executed_Gtid_Set: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3,
1f5bc560-4042-11e5-90a6-d011e342a05a:1-197,
2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3,
27089360-4042-11e5-8b8b-51136eee5e0b:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
3 rows in set (0.00 sec)

The first change is that SHOW SLAVE STATUS displays three rows instead of one. We get one row for each master (Note: we can ask the status for a single channel, using SHOW SLAVE STATUS FOR CHANNEL 'channel_name'\G.)

After that, things are not as clean cut as we have seen with a single master and many slaves. There we had one set of Retrieved_Gtid_Set and one set of Executed_Gtid_Set. Here, instead, we have the same group of executed sets in each row. I would have expected to have the corresponding received/executed set for each channel. Instead, they are mixed up.

Moving on, we'll see now the relay-log info table in MySQL

node4 [localhost] {msandbox} (mysql) > select * from slave_relay_log_info\G
*************************** 1. row ***************************
  Number_of_lines: 7
   Relay_log_name: ./mysql-relay-node1.000002
    Relay_log_pos: 1104
  Master_log_name: mysql-bin.000002
   Master_log_pos: 891
        Sql_delay: 0
Number_of_workers: 0
               Id: 1
     Channel_name: node1
*************************** 2. row ***************************
  Number_of_lines: 7
   Relay_log_name: ./mysql-relay-node2.000002
    Relay_log_pos: 1362290
  Master_log_name: mysql-bin.000002
   Master_log_pos: 1362077
        Sql_delay: 0
Number_of_workers: 0
               Id: 1
     Channel_name: node2
*************************** 3. row ***************************
  Number_of_lines: 7
   Relay_log_name: ./mysql-relay-node3.000002
    Relay_log_pos: 1104
  Master_log_name: mysql-bin.000002
   Master_log_pos: 891
        Sql_delay: 0
Number_of_workers: 0
               Id: 1
     Channel_name: node3
3 rows in set (0.00 sec)

Here we have something more telling: each row is identified by the node name, and the relay-log name, rather than being a simple mysql-relay.000002, includes the channel in its name: mysql-relay-nodeX.000002.

Unfortunately, this table does not have information about GTID. The opposite happens in the performance_schema:

node4 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 1ade9710-4042-11e5-9b1c-ee8cf1128871
                THREAD_ID: 30
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 97
 LAST_HEARTBEAT_TIMESTAMP: 2015-08-11 18:50:14
 RECEIVED_TRANSACTION_SET: 1ade9710-4042-11e5-9b1c-ee8cf1128871:1-3
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 1f5bc560-4042-11e5-90a6-d011e342a05a
                THREAD_ID: 34
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 96
 LAST_HEARTBEAT_TIMESTAMP: 2015-08-11 18:50:11
 RECEIVED_TRANSACTION_SET: 1f5bc560-4042-11e5-90a6-d011e342a05a:1-197
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 2329e9ce-4042-11e5-ae8d-96290ab7793a
                THREAD_ID: 38
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 97
 LAST_HEARTBEAT_TIMESTAMP: 2015-08-11 18:50:15
 RECEIVED_TRANSACTION_SET: 2329e9ce-4042-11e5-ae8d-96290ab7793a:1-3
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
3 rows in set (0.00 sec)

Here, we get the received transaction set, but not the executed set. As we have noted for single master replication, the most complete source of information for monitoring is still SHOW SLAVE STATUS.

Fan-in with MariaDB 10

The installation is similar to what we have seen for MySQL 5.7.

$ ./multi_source.sh ma10.0.20 mariadb FAN-IN
installing node 1
installing node 2
installing node 3
installing node 4
group directory installed in $HOME/sandboxes/multi_msb_ma10_0_20
# server: 1:
# server: 2:
# server: 3:
# server: 4:
# server: 1:
# server: 2:
# server: 3:
# server: 4:
# Setting topology FAN-IN
--------------
CHANGE MASTER 'node1' TO MASTER_HOST='127.0.0.1', MASTER_PORT=19021, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_USE_GTID=current_pos
START SLAVE  'node1'
--------------
CHANGE MASTER 'node2' TO MASTER_HOST='127.0.0.1', MASTER_PORT=19022, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_USE_GTID=current_pos
START SLAVE  'node2'
--------------
CHANGE MASTER 'node3' TO MASTER_HOST='127.0.0.1', MASTER_PORT=19023, MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox', MASTER_USE_GTID=current_pos
START SLAVE  'node3'
--------------

The main differences in the installation are that we don't have to enable GTID or crash-safe tables, as they are on by default, and that each node has a different domain ID (which was set by the installation script.) The domain ID is arbitrary. It can be any number, provided that each master has a different one. Of course, it's better to choose numbers that can be easily attributed to each server. In this case, it's the server ID multiplied by 10.

$ for N in 1 2 3 4; do ./n$N -e 'select @@server_id, @@gtid_domain_id' ; done
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         101 |             1010 |
+-------------+------------------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         102 |             1020 |
+-------------+------------------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         103 |             1030 |
+-------------+------------------+
+-------------+------------------+
| @@server_id | @@gtid_domain_id |
+-------------+------------------+
|         104 |             1040 |
+-------------+------------------+

Here as well we run the test script, followed by loading the Sakila database in one node, and inserting a few transactions in the slave node. Now we can look at the GTID situation:

$ for N in 1 2 3 4; do ./n$N -e 'select @@server_id; select @@global.gtid_current_pos\G' ; done
+-------------+
| @@server_id |
+-------------+
|         101 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_current_pos: 1010-101-3

+-------------+
| @@server_id |
+-------------+
|         102 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_current_pos: 1020-102-119

+-------------+
| @@server_id |
+-------------+
|         103 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_current_pos: 1030-103-3

+-------------+
| @@server_id |
+-------------+
|         104 |
+-------------+
*************************** 1. row ***************************
@@global.gtid_current_pos: 1010-101-3,1030-103-3,1020-102-119,1040-104-2

Then we look at SHOW SLAVE STATUS, and we find something odd:

node4 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G
Empty set (0.00 sec)

The reason for this oddity is that SHOW SLAVE STATUS only works with the default, nameless channel. For named ones, we need to use a new command:

node4 [localhost] {msandbox} ((none)) > show ALL SLAVES status\G
*************************** 1. row ***************************
              Connection_name: node1
              Slave_SQL_State: Slave has read all relay log; waiting for the slave I/O thread to update it
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 19021
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 882
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 1169
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 882
              Relay_Log_Space: 1468
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 1010-101-3
         Retried_transactions: 0
           Max_relay_log_size: 1073741824
         Executed_log_entries: 14
    Slave_received_heartbeats: 0
       Slave_heartbeat_period: 1800.000
               Gtid_Slave_Pos: 1010-101-3,1030-103-3,1020-102-119,1040-104-2
*************************** 2. row ***************************
              Connection_name: node2
              Slave_SQL_State: Slave has read all relay log; waiting for the slave I/O thread to update it
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 19022
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 3230973
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 3231260
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3230973
              Relay_Log_Space: 3231559
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 102
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 1020-102-119
         Retried_transactions: 0
           Max_relay_log_size: 1073741824
         Executed_log_entries: 263
    Slave_received_heartbeats: 0
       Slave_heartbeat_period: 1800.000
               Gtid_Slave_Pos: 1010-101-3,1030-103-3,1020-102-119,1040-104-2
*************************** 3. row ***************************
              Connection_name: node3
              Slave_SQL_State: Slave has read all relay log; waiting for the slave I/O thread to update it
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 19023
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 882
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1169
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 882
              Relay_Log_Space: 1468
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 103
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Current_Pos
                  Gtid_IO_Pos: 1030-103-3
         Retried_transactions: 0
           Max_relay_log_size: 1073741824
         Executed_log_entries: 14
    Slave_received_heartbeats: 0
       Slave_heartbeat_period: 1800.000
               Gtid_Slave_Pos: 1010-101-3,1030-103-3,1020-102-119,1040-104-2
3 rows in set (0.00 sec)

Also here, like in MySQL 5.7, we have multiple rows. As we have seen in the previous article, the GTIDs mentioned here are the ones that were received, while the executed ones are available in @@global.gtid_current_pos. Sadly, I note that SHOW SLAVE STATUS (and SHOW ALL SLAVES STATUS) lists the GTID for all channels in each row, making the monitoring unnecessarily more complex.

Similar to SHOW ALL SLAVES STATUS, MariaDB introduces START/STOP ALL SLAVES, while in MySQL 5.7 the old command without mention of channel acts on all slaves.

The only table that stores GTIDs shows a more organised output:

node4 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;
+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|      1010 |      2 |       101 |      2 |
|      1010 |      3 |       101 |      3 |
|      1020 |    124 |       102 |    118 |
|      1020 |    125 |       102 |    119 |
|      1030 |      8 |       103 |      2 |
|      1030 |      9 |       103 |      3 |
+-----------+--------+-----------+--------+

Here we see two rows for each channel. I am not sure how many are supposed to be recorded, but for the purpose of resuming replication after a crash this is enough.

Multi-source replication: expectations and reality.

Discovering the possibility of multi-source topologies unchains, for some users, an immediate need of using the technology to solve a pet problem. Soon, however, one of the following problems arises:

If you are trying to scale master writes using multiple masters, it's going to fail Due to the nature of replication, each write in one master is replicated in another master. Regardless of the technology used (circular replication, MySQL 5.7 or MariaDB 10 multi-source, Tungsten Replicator) all the writes in one master will go the the other master. The only way of scaling master writes is by using sharding, and multi-source replication cannot help you here.
If you are using multiple masters to solve the fan-in problem, or a case where you want to insert data from different nodes without changing your application, you may run into conflicts. There are several ones of them. I have written an article about them. And another article explaining that with vanilla asynchronous replication, you cannot solve conflicts. You need to avoid them in your application. Many years ago (~ 2006) the MySQL team was looking into the possible solutions for conflict resolution, but that project was shelved. I hope it will be resumed and implemented.

Summing up

Multi-source replication with MySQL is now a reality. It can be deployed and used relatively easily.

However, the monitoring features should be improved, to meet the demands introduced by the new features. Using multi-source replication as it is, with either MySQL 5.7 or MariaDB 10, is a risky proposition, as the DBA has limited visibility into the working of the system.

Looking at the future, I see room for improvement, and I hope that the development teams will look at feedback such as this one to create better tools for DBAs.

Coming next: more topologies

When you hear “multi-source”, you mostly get the idea of what we have described in this article: one slave, many masters. But multi-source is just the main building block that allows us to assemble more complex topologies.
In the next parts of this article we will see topologies where many nodes are masters with the same powers.

The point-to-point all-masters replication, which is the most efficient and resilient topology (without SPOF) with multiple masters.
The star topology, a lightweight all-masters scenario with one SPOF.
Hybrid scenarios, where we mix this and that.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Optimizing Conservative In-order Parallel Replication with MariaDB 10.0

August 14, 2015, 2:34 am

≫ Next: Parsing in MySQL Workbench: the ANTLR age

≪ Previous: MySQL replication in action - Part 2 - Fan-in topology

Fri, 2015-08-14 09:34

geoff_montee_g

Conservative in-order parallel replication is a great feature in MariaDB 10.0 that improves replication performance by using knowledge of group commit on the master to commit transactions in parallel on a slave. If slave_parallel_threads is greater than 0, then the SQL thread will instruct multiple worker threads to concurrently apply transactions that were committed in the same group commit on the master.

Conservative in-order parallel replication is a good alternative to out-of-order parallel replication for use cases where explicitly setting domain IDs for independent transactions is impractical or impossible.

Although setting slave_parallel_threads is enough to enable conservative in-order parallel replication, you may have to tweak binlog_commit_wait_usec and binlog_commit_wait_count in order to increase your group commit ratio on the master, which is necessary to enable parallel applying on the slave. In this blog post, I'll show an example where this is the case.

Note: MariaDB 10.1 also adds the slave_parallel_mode configuration variable to enable other modes for in-order parallel replication.

Configure the master and slave

For our master, let's configure the following settings:

[mysqld]
log_bin
binlog_format=ROW
server_id=1

For our slave, let's configure the following:

[mysqld]
server_id=2
slave_parallel_threads=2

Set up replication on master

Now, let's set up the master for replication:

MariaDB [(none)]> CREATE USER 'repl'@'172.31.31.73' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'172.31.31.73';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.22 sec)

MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: master-bin.000001
        Position: 322
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT BINLOG_GTID_POS('master-bin.000001', 322);
+-------------------------------------------+
| BINLOG_GTID_POS('master-bin.000001', 322) |
+-------------------------------------------+
|                                           |
+-------------------------------------------+
1 row in set (0.00 sec)

If you've set up GTID replication with MariaDB 10.0 before, you've probably used BINLOG_GTID_POS to convert a binary log position to its corresponding GTID position. On newly installed systems like my example above, this GTID position might be blank.

Now, let's set up replication on the slave:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='';
Query OK, 0 rows affected (0.09 sec)

MariaDB [(none)]> CHANGE MASTER TO master_host='172.31.31.72', master_user='repl', master_password='password', master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.04 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.31.31.72
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 322
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 601
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 322
              Relay_Log_Space: 898
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)

Create a test table on master

Let's set up a test table on the master using mysqlslap. The table will automatically be replicated to the slave:

mysqlslap -u root --create-schema=db1 --no-drop \
    --create="CREATE TABLE test_table (id INT AUTO_INCREMENT PRIMARY KEY,file BLOB);"

Generate some data on master

Now, in a Linux shell on the master, let's create a random 1 KB file:

[gmontee@master ~]$ dd if=/dev/urandom of=/tmp/file.out bs=1KB count=1
1+0 records in
1+0 records out
1000 bytes (1.0 kB) copied, 0.000218694 s, 4.6 MB/s
[gmontee@master ~]$ chmod 0644 /tmp/file.out

Get group commit status on master (before first test)

Before we insert our data on the master, let's get the starting values of Binlog_commits and Binlog_group_commits.

MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits');
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Binlog_commits       | 20    |
| Binlog_group_commits | 20    |
+----------------------+-------+
2 rows in set (0.00 sec)

Insert some data on master

Now, let's use mysqlslap to insert our random file into the table a bunch of times:

mysqlslap -u root --create-schema=db1 --concurrency=5 --iterations=20 --no-drop \
    --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"

Get group commit status on master (after first test)

After inserting our data, let's get the values of Binlog_commits and Binlog_group_commits again.

MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits');
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Binlog_commits       | 120   |
| Binlog_group_commits | 120   |
+----------------------+-------+
2 rows in set (0.00 sec)

To get the group commit ratio for our batch job, we would subtract the pre-job Binlog_commits and Binlog_group_commits values from the post-job values:

transactions/group commit = (Binlog_commits (after) - Binlog_commits (before))/(Binlog_group_commits (after) - Binlog_group_commits (before))

So here, we have:

transactions/group commit = (120 - 20) / (120 - 20) = 100 / 100 = 1 transactions/group commit

At 1 transactions/group commit, there isn't any potential for the slave to apply transactions in parallel.

Insert some more data on master

Now let's change binlog_commit_wait_count and binlog_commit_wait_usec:

MariaDB [(none)]> SET GLOBAL binlog_commit_wait_count=2;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> SET GLOBAL binlog_commit_wait_usec=10000;
Query OK, 0 rows affected (0.00 sec)

And then let's insert some more data:

mysqlslap -u root --create-schema=db1 --concurrency=5 --iterations=20 --no-drop \
    --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"

Get group commit status on master (after second test)

After changing the values for binlog_commit_wait_count and binlog_commit_wait_usec and inserting more data, let's get the values of Binlog_commits and Binlog_group_commits again.

MariaDB [(none)]> SHOW GLOBAL STATUS WHERE Variable_name IN('Binlog_commits', 'Binlog_group_commits');
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Binlog_commits       | 220   |
| Binlog_group_commits | 145   |
+----------------------+-------+
2 rows in set (0.00 sec)

So here, we have:

transactions/group commit = (220 - 120) / (145 - 120) = 100 / 25 = 4 transactions/group commit

At 4 transactions/group commit, there is much more potential for the slave to apply transactions in parallel.

Check slave concurrency

Now that the values of binlog_commit_wait_count and binlog_commit_wait_usec have been tweaked on the master to allow for parallel applying on the slave, let's execute a bigger job on the master and then see if the slave actually does apply its transactions in parallel.

First, let's run this on the master:

mysqlslap -u root --create-schema=db1 --concurrency=100 --iterations=100 --no-drop \
    --query="INSERT INTO db1.test_table (file) VALUES (LOAD_FILE('/tmp/file.out'));"

And then, let's execute SHOW FULL PROCESSLIST on the slave:

MariaDB [(none)]> SHOW FULL PROCESSLIST;
+----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                                         | Info                  | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+
|  2 | system user |           | NULL | Connect | 2266 | Waiting for master to send event              | NULL                  |    0.000 |
|  3 | system user |           | NULL | Connect |    7 | closing tables                                | NULL                  |    0.000 |
|  4 | system user |           | NULL | Connect |    7 | closing tables                                | NULL                  |    0.000 |
|  5 | system user |           | NULL | Connect | 2266 | Waiting for room in worker thread event queue | NULL                  |    0.000 |
|  8 | root        | localhost | NULL | Query   |    0 | init                                          | SHOW FULL PROCESSLIST |    0.000 |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+-----------------------+----------+
5 rows in set (0.00 sec)

Here, both of our worker threads have the state "closing tables", so both of them are applying transactions in parallel.

Conclusion

If you want to use conservative in-order parallel replication to improve slave performance, but you find that your slave isn't applying transactions in parallel, you may want to adjust binlog_commit_wait_count and binlog_commit_wait_usec.

Tags:

About the Author

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Parsing in MySQL Workbench: the ANTLR age

August 14, 2015, 2:57 am

≫ Next: The 8 Best Ways To Lose Your DBA

≪ Previous: Optimizing Conservative In-order Parallel Replication with MariaDB 10.0

Some years ago I posted an article about the code size in the MySQL Workbench project and talked a bit about the different subprojects and modules. At that time the project consisted of ~400K LOC (including third-party code) and already then the parser was the second biggest part with nearly a forth of the size of the entire project. This parser project back then used the yacc grammar from the MySQL server codebase and was our base for all parsing tasks in the product. Well, things have changed a lot since these days and this blog post discusses the current parsing infrastructure in MySQL Workbench.

We started looking into a more flexible way of creating our parser infrastructure. Especially the generation of lexer and parser from the grammar was a long winded process, that included a lot of manual tweaking. The most important advantage of using the MySQL server yacc grammar is however that we always stay in sync easily. Though, this is true only for the server version the grammar we picked is for. But MySQL Workbench needs more flexibility, supporting a whole range of server versions (from 5.1 up to the latest 5.7.8). Hence we decided to switch to a different tool: ANTLR. Not so surprising, however: the yacc based parser is still part of MySQL Workbench, because it’s not possible to switch such an important part in one single step. However over time the ANTLR based parsers will ultimately become our central parsing engine and one day we can rule out the yacc parser entirely.

Files created by ANTLR are the biggest single source files I have ever seen. The MySQLLexer.c file is 40MB in size with almost 590K LOC. No wonder our project metrics have changed remarkably, though not only because of the ANTLR based parser. Here are the current numbers (collected by a script shipped with the source zip):

machine:workbench Mike$ sh tools/count_loc.sh

c (antlr parser): 1484033 loc         6 files
cpp: 418494 loc       704 files
cxx: 28484 loc         2 files
mm: 31926 loc        97 files
m: 9795 loc        37 files
py: 87652 loc       170 files
cs: 43149 loc       150 files
h: 143743 loc       928 files
Total: 2247276 (763243 without ANTLR parser)
Total Files: 2094 (1166 without headers)

The reason for the big size is the support of the full Unicode BMP for identfiers, which requires some really big state tables in the lexer.

ANTLR 3 – The Workhorse

The current version of ANTLR is 4, published almost 2 years ago. However, we are still on version 3, for a good reason. ANTLR can generate parsers for various target languages, like C#, Java and Python. However, still today, there is no C or C++ target for ANTLR 4, while ANTLR 3 supports both languages well. Hence we decided to stay with ANTLR 3 and with every addition we do (e.g. see the code completion engine) we are more tight to it and unlikely to upgrade to version 4 any time soon. At least a C target should have been one of the first targets, really.

But why not stay with the server’s parser, you might ask. It’s thoroughly tested and obviously is as compatible as a parser can be for the MySQL language. Well, a flexible client tool has different needs compared to a server and that’s why. It starts with the ability to support multiple server versions (the server parser always only supports its current version), continues with different requirements for handling errorneous sql code and really goes own ways when it comes to tooling (like the mentioned code completion engine or the quick syntax checker). ANTLR 3 generates socalled top-down parsers (recursive descent), while YACC creates bottom-up parsers (https://en.wikipedia.org/wiki/Parsing), which use a different approach to parse text. Our ANTLR based parser usually gives better error message, e.g. for a query like:

select 1, from sakila.actor;

the server shows that dreaded “Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘from sakila’ at line 1″), while the ANTLR parser says: Syntax error: unexpected comma (with precise position). You can see this in action in the MySQL Workbench SQL editors (via error tooltips).

Another really useful ability in an ANTLR based parser is that you can use any rule in the grammar as starting point, which is not easily possible with a YACC based parser. All grammar rules are generated as functions (remember: recursive descent parser). So you can always call any of them with your input, e.g. you can easily only parse expressions, instead of a full query. We use this ability to parse SQL code in our object editors (stored procedures, triggers etc.) which implicitly disallows any other SQL code not allowed at this point. Also datatype parsing uses the new parser, allowing for maximum conformity of user specified data types and good feedback to the user in case of an error. For developers very important is also the fact that you can easily debug the parser code, if needed. Try that with a YACC based parser which is only iterating over states.

Finally, the server grammar has got quite a number of problems, like a big number of shift-reduce conflicts, a handwritten lexer which is hard to maintain, trouble with semantic action execution (multi execution if not carefully placed) and others. However, the server team is constantly working on improving their parser. It’s just not a good choice for MySQL Workbench.

We decided to use the C target from ANTLR because the C++ support was not only incomplete (and still is) but lead to huge compilation times. Integrating a C module in a C++ environment is trivial and rewards us with a high performance parser.

A Dynamic Parser

Above I mentioned that a GUI tool like MySQL Workbench needs to support multiple server version. Additionally, it must be possible to switch behavior based on the SQL mode. All this is possible by use of socalled semantic predicates. These constructs allow to switch rules or alternatives off and on based on some condition (e.g. the server version). This will ensure a user will always get the right syntax check and proper code completion, regardless which server he actually connects to. This goes so far that we can easily toggle language parts that were only valid for a certain version range (e.g. the NONBLOCKING keyword, which was valid only between 5.7.0 and 5.7.6).

Though we not only use the generated files for standard parsing tasks, but also have a dedicated scanner (based on the lexer) that helps us in determining a context for the context sensitive help. This way we can even handle partially invalid SQL code easily.

In opposition to the server grammar and parser generation from it, our ANTLR parser is generated and compiled automatically during a build and only when there was a change in the grammar file. This allows for easy development of the parser grammar. A test tool exists that uses the same parser and allows to take a deeper look at a query and the parsing process, by printing the AST (abstract syntax tree) in a windows. Additionally it allows to run a subset of our parser unit tests and even can generate test result code we use to feed certain parser tests.

The Parsers in Code

Generating the parsers requires Java installed (because ANTLR is a Java tool), which is the reason why we include the generated files in the source package. This way you are not forced to install Java when you want to build MySQL Workbench. The generation step is simply skipped as the grammar and generated files have the same current timestamp. However, as soon as you change the grammar you will need Java (and the ANTLR jar) to regenerate the parser files, when you build MySQL Workbench yourself.

Starting with Workbench 6.3 we use 2 parser variants: one that generates an AST (abstract syntax tree) and one without. The latter is used for our quick syntax checker as it is twice as fast as the one generating an AST (generation cannot be switched dynamically). The AST however is needed to easily walk the parsed elements, e.g. to find the type of a statement, convert query details into our internal representation, manipulate (rewrite) queries and other things.

The entire parsing engine is organized in 3 layers:

The generated C parsers wrapped by small classes to provide a C++ interface (including a tree walker, the mention syntax checker and a scanner). You can find all related files in the “library/mysql.parser” subfolder. The generated and wrapper files are:
- MySQL.tokens (a list of token names and their values)
- MySQLLexer.h/c (the generated lexer)
- mysql-scanner.h/cpp (the C++ wrapper around the generated lexer)
- MySQLParserc.h/c (the generated (full) parser)
- mysql-parser.h/cpp (the C++ wrapper around the generated (full) parser)
- MySQLSimpleParser.h/c (the generated parser without AST creation) + its token file
- mysql-syntax-check.h/cpp (the C++ wrapper for that, it shares the lexer with the main parser)
- Some support files (mysql-parser-common.h/cpp, mysql-recognition-types.h)
- The “library/mysql.parser/grammar” folder contains the 2 grammar files (full + simplified parser), build scripts for each platform and the test application (currently for OSX only).
A module providing our socalled parsing services, including parsing of individual create statements (e.g. for our table or view editors). The parsing services mostly deal with conversion of sql text into our grt tree structure, which is the base for all object editors etc. Currently this is separated into a dynamically loadable module, containing the actual implementation and an abstract class for direct use of the module within Workbench. The related files are:
- modules/db.mysql.parser/src/mysql_parser_module.h/cpp (the module with most of the actual code)
- backend/wbpublic/grtsqlparser/mysql_parser_services.h/cpp (the abstract interface for the module + some support code)
The editor backend driving the UI, connecting the parsing services and implementing error checking + markup as well as code completion. This layer is spread over a couple of files, all dealing with a specific aspect of handling sql code, which includes query determination and data type parsing as well as our object editors and the sql editor backend. This backend is a good example of the integration of GUI, Workbench backend and parser services including syntax checks and code completion (backend/wbpublic/sqlide/sql_editor_be.h/cpp).

Conformity

After some years while our grammar and parser matured we reached not only full conformity with the server parser, but could even add language features that aren’t released yet. Our grammar is as close to the server parser as one can be and is the most complete grammar you can get for free (as part of the MySQL Workbench package, which ships the grammar not only in the source zip, but also in the binary package, because it is needed for code completion). Once in a while (also before big releases) we scan all changes in the server’s sql_yacc.y file and incorporate them, to stay up to date.

Big Thanks

Finally I’d like to express my respect and thankfulness for the guys that stand behind such an extremely useful and well done tool as ANTLR is. These are mainly Prof. Terence Parr (the ANTLR guy) and Sam Harwell for their dedication to ANTLR over all the years as well as Jim Idle for solving the complex task of converting an OOP ANTLR target (Java) to a non OOP language (C), which is the foundation we build everything on.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The 8 Best Ways To Lose Your DBA

August 11, 2015, 5:42 am

≫ Next: The 8 Best Ways To Lose Your DBA

≪ Previous: Parsing in MySQL Workbench: the ANTLR age

As we all know, good DBAs are a dime a dozen. They’re easy to replace and the cost of replacing them in terms of lost productivity, downtime, recruiting, training, etc is minimal. You may even suspect that your DBA(s) aren’t very good since there is occasional downtime and people complain about the systems running too slowly. Firing people is icky so we’ve identified 8 great ways to encourage your DBA to leave.

8. Specialize Their Role

Nothing puts more pressure on a DBA to perform than being a specialist. A specialist is the only person who has access or knowledge to do something, which means everyone else is going to be coerced into learned helplessness and apathy. Oh, and the bystander effect will run rampant when something goes wrong. “I’m sure the DBA is working on that.”

Yep. You definitely want the DBA’s role to be specialized so they’re properly isolated and all the blame falls on them when anything goes bad. Certainly don’t want developers and other operations staff to be competent with the database!

7. Institute Change Control

Since you’ve created a specialized DBA role in which all database responsibility rests on the DBA(s) you might as well take the next step to institute strong change control. Since the developers have no responsibility for database performance problems they create, they’ll write code recklessly and figure it’s the DBAs responsibility to fix it. To solve for that all code changes must be reviewed by the DBA before shipping to production. No changes can happen during business hours. And there will be no changes during critical times like the Super Bowl ads or the holiday shopping season, period.

Lumbergh

We’re fully confident this’ll solve all the outages, but as a delicious side effect of this, we’ll also rub the DBA’s nose in a bunch of menial, thankless reviews of code and applications they don’t understand, which should incent them to leave right away.

6. Mismatched Control And Responsibility

Nothing punishes a DBA better than being responsible for systems they can’t control. Naturally, item #7 is designed to create the illusion of control, so when they protest, we can point to that and say “what do you mean you have no control over what queries are running in production?” The DBA is not only wholly responsible for database performance, but also for delays in front-end development and feature roll-out.

5. Make Them A SPOF

If you only have one DBA by instituting #8, 7 and 6 above you’ve done a great job of creating a single point of failure. Even with multiple DBAs you’ve created a team of SPOFs. You can add insult to this injury through promotions. The smartest management move I ever saw was when an overworked DBA (let’s call him Atlas, because he held the world on his shoulders) was promoted. I mean, the man just wouldn’t quit. He was in the office at 2am every week doing the things that management insisted couldn’t be done during work hours, he never got to leave or turn off his cellphone from October through January, and this had gone on for years. Clearly a promotion to DBA Manager was the only way to make him quit. Did it work? Sure did, it only took a week.

4. Give Them Great Tools To Do Their Job

As the VP of Technology, it’s clearly your job to tell the DBA what tools they need to do their job. Make sure you do that. Remember, any production MySQL issue can be properly diagnosed by staring at thousands or tens of thousands of time series charts of SHOW STATUS counters in five-minute resolution, so Cacti or Graphite ought to do the job just fine. If they insist on more than that, you can pretend you’re bending over backwards by giving them Nagios or statsd. These create an illusion of database performance monitoring by creating mountains of false alarms tied to ratios that don’t really mean much.

3. Make Sure Developers Can’t Self-Service

Whatever you do, don’t let the developers get their work done by themselves. The DBA can’t truly be a SPOF if the developers can get stuff done without them. You need developers to go to the DBA with every little database-related request. This will impress upon the DBA their essential role in the organization and how they’re failing to live up to it and need to leave. Coincidentally, using Cacti or Graphite for monitoring will help ensure all DB-related questions can only be answered by the DBA.

2. Insist On Root Cause Analysis

There is always a single root cause. Five whys. It’s a human error problem. Who is the human error? The DBA is. The DBA’s very existence is an error. If there are outages, downtime, sluggish performance, delays in code release the root cause has to be database performance and that is the DBA’s responsibility 100%. Creating a revolving door DBA position will guarantee that the people responsible for the database don’t know much about the system because they just got here. Not that that’s an acceptable excuse.

1. Work-Life Balance Is Overrated

You get the most out of your people by driving them hard. No one ever got good results on the battlefield by handing out Kleenex. No matter how many developers you have, 1 DBA is plenty; in fact try to make it a side responsibility for one of your systems admin folks. If they whine about their burdens, tell them to just work harder. Your DBA should be online or in the office after hours, and if they’re not they’re slackers and should be replaced anyway. Stress, guilt, all encompassing responsibility, shame, and failure are powerful motivators, too.

Conclusions

Remember: as an IT Manager/Director/VP you need to have a scapegoat, and your DBA should be that scapegoat. By placing the DBA in an impossible situation, giving them full responsibility for keeping the systems up and running, and keeping them from having collaborative tools that allow developers to self-service and take responsibility for being the first line of defense against bad queries, you’ll always be able to tell your boss that the reason for the problem is poor database administration.

The alternative to using the DBA as your scapegoat is to have that responsibility fall on you! You might have to take responsibility for building or licensing collaboration tools that allow the whole team to function more efficiently. You might have to build a culture of shared responsibility and teamwork. And, while doing so might improve speed, innovation and help attract and keep top drawer developers, it requires change and change is hard.

Much easier to just churn through DBAs.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The 8 Best Ways To Lose Your DBA

August 14, 2015, 5:42 am

≫ Next: Log Buffer #436: A Carnival of the Vanities for DBAs

≪ Previous: The 8 Best Ways To Lose Your DBA

8. Specialize Their Role

7. Institute Change Control

Lumbergh

6. Mismatched Control And Responsibility

5. Make Them A SPOF

4. Give Them Great Tools To Do Their Job

3. Make Sure Developers Can’t Self-Service

2. Insist On Root Cause Analysis

1. Work-Life Balance Is Overrated

Conclusions

Much easier to just churn through DBAs.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧