Quantcast
Channel: Planet MySQL
Viewing all 18783 articles
Browse latest View live

MySQL HA Architecture #1 : InnoDB Cluster & Consul

$
0
0

 

I received many request about MySQL High Availability Architecture. There are a lot of tools, but how can we use them to achieve MySQL HA without over engineering everything.

To answer such demand, there is no generic architecture, of course there are leaders in solutions, but put them together can result in many different designs, I will start a series of article on that topic. Feel free, as usual to comment, but also recommend other tools that could be used in a future post.

So today’s post is related to MySQL InnoDB Cluster with Hashicorp’s Consul.

Architecture

This is the overview of the deployed architecture:

As you can see, we have 3 data centers, but in fact we have only two DCs on premises and one in the cloud. A large amount of request are always related on having only 2 DCs, or at least capacity for only two DCs (storage and CPU). But we already know that to achieve High Availability, we need at least 3 (3 nodes, 3 DCs, …). So in our design, we have DC1 and DC2 running each an independent MySQL InnoDB Cluster in Single-Primary Mode. We also have an asynchronous replication going from one member of dc1 to the primary master of dc2 and reverse way too.

In this post, I won’t cover the asynchronous replication failover that moves the asynchronous slave task on another member of the same group when the Primary Role is moved to another member.

Then we have one consul member on DC3 being there only for Quorum and answer DNS requests.

mysqlha service

In consul we create mysqlha service like this :

{
    "services": [
        {
            "name": "mysqlha",
            "tags": [
                "writes",
                "reads"
            ],
            "port": 3306,
            "checks": [

                {
                    "script": "/etc/consul.d/scripts/check_mysql.sh",
                    "interval": "5s"
                }
            ]
        }
    ]
}

The check_mysql.sh script is just an easy bash script connecting on the read/write port of the local MySQL-Router
running on each member of the InnoDB Cluster (Group) and checking if there is a Primary non-R/O member.

This is what we can see in consul’s UI:

Failover rule

Now we will add a rule (query) for the failover:

curl --request POST --data '{
"Name": "mysqlha",
"Service": {
   "Service": "mysqlha",
   "Failover": {
     "Datacenters": ["dc1","dc2"]
   }, "OnlyPassing": true
  }
}' http://127.0.0.1:8500/v1/query

We need to perform this operation on one member of each DC (dc1, dc2 and dc3).

When this is performed, we can test the query on dc3 (where there is no service mysqlha is running):

[root@monitordc3 ~]# dig @127.0.0.1 -p 53 mysqlha.query.mysql

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.4 <<>> @127.0.0.1 -p 53 mysqlha.query.mysql
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28736
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;mysqlha.query.mysql.		IN	A

;; ANSWER SECTION:
mysqlha.query.mysql.	0	IN	A	192.168.50.4
mysqlha.query.mysql.	0	IN	A	192.168.50.3
mysqlha.query.mysql.	0	IN	A	192.168.50.2

We can notice that the dns returns all 3 nodes in DC1 (192.168.50.x), perfect ! We can of course connect on any one using the router ports (6446 and 6447).

The next step is related to the application servers, we have here two options, or we use a consul client service to perform the dns resolving or we just setup dns resolving on it but… on all we need to use a the node in dc3 and one node in the other dc (or we can create a fake extra dc only for doing the dns resolving, this is because on dc2 the service is running anyway but we want to be redirected to the primary dc if it’s running… we could query mysqlha.query.dc1.mysql but that will work only if there will be a consul server in dc1 reachable… which is not the case if the full dc goes down).

So if we have an app server in dc2, its name resolving setup should be :

search lan localdomain
nameserver 192.168.70.2
nameserver 192.168.50.2

Asynchronous Slave

As between our two data centers, the bandwidth can be high, we opted for an asynchronous replication between the two DCs (remember, dc2 is used as GEO DR).

So now we need to setup asynchronous replication between both DCs, I opted for the Primary Masters of each DC to be the async master and the async slave. Being the slave is mandatory for the Primary Master as it’s the only one allowed to write data.

I created the replication grants on each Primary Masters (in dc1 and dc2), and I purged the GTID to start the asynchronous replication:

Traditional replication setup

I started with the user and the requested privileges:

mysql> CREATE USER 'async_repl'@'%' IDENTIFIED BY 'asyncpwd';
Query OK, 0 rows affected (0.50 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'async_repl'@'%';
Query OK, 0 rows affected (0.11 sec)

Don’t forget to set GTID_PURGED according to what you see in the other Primary Master.

Example:

First I check on mysqld4dc2 (the current Primary Master in DC2):

mysql4dc2> show global variables like 'GTID_EXECUTED';
+---------------+--------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                |
+---------------+--------------------------------------------------------------------------------------+
| gtid_executed | 0de7c84a-118e-11e8-b11c-08002718d305:1-44,
a6ae95b5-118d-11e8-9ad8-08002718d305:1-27 |
+---------------+--------------------------------------------------------------------------------------+

And then on server in DC1 do:

mysql> SET @@GLOBAL.GTID_PURGED='0de7c84a-118e-11e8-b11c-08002718d305:1-44,
                                 a6ae95b5-118d-11e8-9ad8-08002718d305:1-27'

We do the same on all machines from dc1 with the GTID_EXECUTED from DC2.

Now we will add a new service in consul (on each MySQL nodes in DC1 and DC2):

        {
            "name": "mysqlasync",
            "tags": [
                "slave"
            ],
            "port": 3306,
            "checks": [
                {
                    "script": "/etc/consul.d/scripts/check_async_mysql.sh",
                    "interval": "5s"
                }
            ]
        }

And don’t forget to copy the script check_async_mysql.sh in /etc/consul/scripts/ (this script is not made for production and may contain bugs, it’s just an example). You also need to edit this script to set SLAVEOFDC to the opposite DC’s name.

This script also uses the consul keystore capability to set which node is the Primary-Master in each DC:

[root@mysql1dc1 consul.d]# curl -s "http://localhost:8500/v1/kv/primary?raw&dc=dc1"
mysql1dc1
[root@mysql1dc1 consul.d]# curl -s "http://localhost:8500/v1/kv/primary?raw&dc=dc2"
mysql4dc2

After having reloaded consul, in the UI you can see something like this on the Primary Master being also the asynchronous slave:

And on another node:


In Action

Let’s see it in action:

In the video we can see that when we kill the non Primary member remaining in DC1, the Group lost quorum. Of course the size of the group at that moment is 2 of 2, and when it gets killed even if the Primary one is still active, it has a quorum of 1/2 which is not bigger than 50%. Therefor it stops being available to avoid any split-brain situation.

Then when we restart the nodes we killed in DC1, we have to force quorum in the single remaining node to be able to rejoin the others instances after reboot. This is how:
Force Quorum:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.forceQuorumUsingPartitionOf('root@mysql2dc1:3306')
Restoring replicaset 'default' from loss of quorum, by using the partition 
composed of [mysql2dc1:3306]

Please provide the password for 'root@mysql2dc1:3306': 
Restoring the InnoDB cluster ...

The InnoDB cluster was successfully restored using the partition 
from the instance 'root@mysql2dc1:3306'.

WARNING: To avoid a split-brain scenario, ensure that all other members of the replicaset 
         are removed or joined back to the group that was restored.

Rejoin the first instance:

MySQL mysql2dc1:3306 ssl JS > cluster.rejoinInstance('root@mysql3dc1:3306')
...
The instance 'root@mysql3dc1:3306' was successfully rejoined on the cluster.

The instance 'mysql3dc1:3306' was successfully added to the MySQL Cluster.

And rejoin the last one:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.rejoinInstance('root@mysql1dc1:3306')
...
Rejoining instance to the cluster ...

The instance 'root@mysql1dc1:3306' was successfully rejoined on the cluster.

The instance 'mysql1dc1:3306' was successfully added to the MySQL Cluster.

And finally we can see that all members are back online:

 MySQL  mysql2dc1:3306 ssl  JS > cluster.status();
{
    "clusterName": "MyGroupDC1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "mysql2dc1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "mysql1dc1:3306": {
                "address": "mysql1dc1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql2dc1:3306": {
                "address": "mysql2dc1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "mysql3dc1:3306": {
                "address": "mysql3dc1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }
    }, 
    "groupInformationSourceMember": "mysql://root@mysql2dc1:3306"

Conclusion

As you noticed, MySQL Group Replication and MySQL InnoDB Cluster can be the base of a full MySQL High Availability architecture using multiple Data Centers.  MySQL InnoDB Cluster can of course be used with a multitude of other solutions like Orchestrator, ProxySQL, … I’ll try to illustrate these solutions when I have some time, stay tuned !


Bitten by MariaDB 10.2 Incompatibile Change

$
0
0

Since MariaDB first claimed to be a “drop-in replacement for MySQL”, I knew that claim wouldn’t age well.  It hasn’t – the list of incompatibilities between various MySQL and MariaDB verions grows larger and larger over time, now requiring a dedicated page to compare two specific recent releases.  Despite that, I largely operated under the assumption that basic, general use cases would see no difference.

I was wrong.

This past week, I was “helping” with a Support issue where installation of Cloudera Manager failed on MariaDB 10.2.  Like many products, the schema for Cloudera Manager has evolved over releases, and new installs apply both the original schema and version-specific updates to bring the schema to a current version.  When it failed, I reviewed the scripts and found nothing that should cause problems.  A fantastic Support colleague (thanks Michalis!) dug in deeper, and – contradicting my assumptions – found that the root problem was an incompatible change made in MariaDB 10.2.8.  This causes ALTER TABLE … DROP COLUMN commands to fail in MariaDB 10.2.8, which do not fail in any MySQL version, nor in any previous MariaDB version – including 10.2.7.

When dropping a column that’s part of a composite unique constraint, MariaDB throws error 1072.  Here’s an example:


MariaDB [test]> SHOW CREATE TABLE uc_test\G
*************************** 1. row ***************************
Table: uc_test
Create Table: CREATE TABLE `uc_test` (
`a` int(11) NOT NULL,
`b` int(11) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
`d` int(11) DEFAULT NULL,
PRIMARY KEY (`a`),
UNIQUE KEY `b` (`b`,`c`),
UNIQUE KEY `d` (`d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [test]> ALTER TABLE uc_test DROP COLUMN c;
ERROR 1072 (42000): Key column ‘c’ doesn’t exist in table
MariaDB [test]> ALTER TABLE uc_test DROP COLUMN b;
ERROR 1072 (42000): Key column ‘b’ doesn’t exist in table

This differs from previous version behavior, as well as MySQL, which does one of two things:

  1. Narrows the UNIQUE constraint to the remaining columns (surely not ideal, Bug#17098).
  2. Rejects the change if the resulting narrowed UNIQUE constraint is violated.

While narrowing the existing UNIQUE constraint seems inappropriate, at least MySQL provides a useful and appropriate error message as a result:

mysql> INSERT INto uc_test values (1,1,1,1),(2,1,2,2);
Query OK, 2 rows affected (0.17 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> alter table uc_test DROP COLUMN c;
ERROR 1062 (23000): Duplicate entry ‘1’ for key ‘b’

I agree with the suggestion in the MySQL bug report that a more appropriate action would be to drop the entire UNIQUE constraint in which the dropped column appeared.

MariaDB elected to go a different route. This behavior change was triggered by the work done in MDEV-11114, which explicitly identified a defect in MariaDB’s CHECK CONSTRAINT feature.  As the CHECK CONSTRAINT feature is not found upstream, neither was this defect, which failed to drop constraints declared independent of the column definition when the related column was dropped.  Rather than restrict the fix to the problematic CHECK CONSTRAINT code, MariaDB developers elected to impose new restrictions on all CONSTRAINTS – despite a very clearly-articulated comment from Mike Bayer documenting the incompatibilities this would trigger.  MariaDB also elected to make this change  in a maintenance release instead of a minor version upgrade.  As a result, SQL scripts – like that used by Cloudera Manager – fail using MariaDB 10.2.8, while they did not in 10.2.7.

Note that I have not tested this behavior in a replication stream – I presume it breaks replication from any MySQL or MariaDB version to 10.2.8+.  That’s an even bigger issue for many deployments.

MariaDB has an opportunity – and I’d argue an obligation – to avoid introducing such incompatibilities in maintenance releases.  As somebody supporting enterprise products built to use relational databases such as MariaDB, we need to have confidence that maintenance release upgrades won’t break products relying on those services.  Ignoring the promise of being a “drop in replacement”, the consequences of this change were known in advance, yet MariaDB chose to pursue it in a maintenance release.  There are two real, underlying bugs – the CHECK CONSTRAINT defect introduced by MariaDB, which should have been fixed without impact to other functionality, and the long-standing MySQL defect that triggers constraint narrowing.  SQL standards compliance is a viable reason for behavior changes, but such should be done at minor releases only.

I also object to MariaDB’s decision to repurpose an existing MySQL error code for this purpose.  I agree with Elena’s comment that more meaningful error messages are required.  This is new error-generating behavior, and should have a unique, new error code and message to accompany it for troubleshooting purposes.  This is particularly important when the behavior deviates from established historical behavior.

I’m all for MariaDB developing their own value-add features for MySQL Server.  Defects found in such features should be fixed without impacting common behavior, though.

 

ProxySQL Series: MySQL Replication Read-write Split up.

$
0
0

At Mydbops we always thrive to provide the best MySQL Solutions. We are exploring the modern SQL load balancers. We have planned to write a series of blog on ProxySQL.

The first blog in this series is  how to set up ProxySQL for MySQL Replication Topology including Read / Write Split and some background over ProxySQL.

What is ProxySQL ?

  • ProxySQL is a open-source high-performance SQL aware proxy. It runs as a daemon watched by a monitoring process.
  • ProxySQL seats between application and db servers.
  • The daemon accepts incoming traffic from MySQL clients and forwards it to backend MySQL servers.

A few most commonly used features are :

  • Read/write split
  • On-the-fly queries rewrite
  • Query caching
  • Real time statistics on web console
  • Seamless replication switchover
  • Query mirroring
  • Monitoring

So the main advantages of using ProxySQL is, it is designed to run continuously without needing to be restarted. Most configuration can be done at runtime using queries similar to SQL statements and it is more light weight.

Let us explore the basic Query Routing  (Read/Write split) for the effective load sharing. We have set up 4 nodes to make the architecture .

  • node1 (172.17.0.1) , Master
  • node2 (172.17.0.2) , Slave
  • node3 (172.17.0.3) , Slave
  • node4 (172.17.0.4) , ProxySQL
ProxySQL_MySQL_10
ProxySQL on Single Master and Two Slaves.

Note: By default, ProxySQL binds with two Ports 6032 and 6033.
6032 is admin port and 6033 is the one which accepts incoming connections from clients.

MySQL Replication setup :

Configuring MySQL’s master-slave replication is outside the scope of this tutorial, we already have nodes with replication running.

Before entering to admin interface of ProxySQL , create one application user with all privileges required to your application and one monitoring user at every MySQL DB server.

mysql> CREATE USER 'sysbench'@'172.17.0.%' IDENTIFIED BY 'sysbench';
mysql> GRANT ALL PRIVILEGES on *.* TO 'sysbench'@'172.17.0.%';
 
mysql> CREATE USER  'monitor'@'172.17.0.%' IDENTIFIED BY 'monitor';
mysql> GRANT USAGE,REPLICATION CLIENT on *.* TO 'monitor'@'172.17.0.%';

mysql> FLUSH PRIVILEGES;

ProxySQL Setup :

Install and start ProxySQL :

For Installation kindly refer : https://github.com/sysown/proxysql/wiki

$ service proxysql start
Starting ProxySQL: Main init phase0 completed in 0.000491 secs.
Main init global variables completed in 0.000675 secs.
Main daemonize phase1 completed in 0.00015 secs.
DONE!

Now connect to ProxySQL admin interface to start with configuration :

$ mysql -u admin -padmin -h 127.0.0.1 -P6032

Configure Backends :

ProxySQL uses the concept of hostgroup. A hostgroup is a group of host with logical functionalities.

In this setup , we have used just need 2 hostgroups:

hostgroup 0 for the master [Used for Write queries ]
hostgroup 1 for the slaves [Used for Read Queries ]

Apart from this we can also have one analytical server as slave of same master and we can assign new hostgroup id for the server and redirect all analytical related queries (long running) at this host.

Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (0,'172.17.0.1',3306);
Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'172.17.0.2',3306);
Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'172.17.0.3',3306);

Admin> INSERT INTO  mysql_replication_hostgroups VALUES (0,1,'production');
Admin > SELECT * FROM mysql_replication_hostgroups;
+------------------+------------------+------------+
| writer_hostgroup | reader_hostgroup | comment    |
+------------------+------------------+------------+
| 0                | 1                | production |
+------------------+------------------+------------+

Admin> LOAD MYSQL SERVERS TO RUNTIME; 
Admin> SAVE MYSQL SERVERS TO DISK; 

 

Note: When we load MYSQL SERVERS , Our writer host also get configured in reader hostgroup automatically by ProxySQL to handle all those queries which are redirected to reader hostgroup in case no slaves are online.

So bonus point here is we can decrease the weightage assigned to master servers inside mysql_server table for reader hostgroup , so that our most of read queries will go on server which has higher weight.

UPDATE mysql_servers SET weight=200 WHERE hostgroup_id=1 AND hostname='172.17.0.1';     

Admin> SELECT hostgroup_id,hostname,port,status,weight FROM mysql_servers;
+--------------+------------+------+--------+---------+
| hostgroup_id | hostname   | port | status | weight  |
+--------------+------------+------+--------+---------+
| 0            | 172.17.0.1 | 3306 | ONLINE | 1000    |
| 1            | 172.17.0.2 | 3306 | ONLINE | 1000    |
| 1            | 172.17.0.3 | 3306 | ONLINE | 1000    |
| 1            | 172.17.0.1 | 3306 | ONLINE | 200     |
+--------------+------------+------+--------+---------+

Configure User :

monitor user will continuously check the status of backend in specified interval.
sysbench is user created for the application.

Admin> UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_password';
Admin> LOAD MYSQL VARIABLES TO RUNTIME;
Admin> SAVE MYSQL VARIABLES TO DISK;

Admin> INSERT INTO mysql_users(username,password,default_hostgroup) VALUES (sysbench,sysbench,1);
Admin> SELECT username,password,active,default_hostgroup,default_schema,max_connections,max_connections FROM mysql_users;
    +----------+----------+--------+-------------------+----------------+-----------------+-----------------+
    | username | password | active | default_hostgroup | default_schema | max_connections | max_connections |
    +----------+----------+--------+-------------------+----------------+-----------------+-----------------+
    | sysbench | sysbench | 1      | 0                 | NULL           | 10000           | 10000           |
    +----------+----------+--------+-------------------+----------------+-----------------+-----------------+

Admin> LOAD MYSQL USERS TO RUNTIME;
Admin> SAVE MYSQL USERS TO DISK;

Configure monitoring :

ProxySQL constantly monitors the servers it has configured. To do so, it is important to configure some interval and timeout variables ( in milliseconds ).

Admin> UPDATE global_variables SET variable_value=2000 WHERE variable_name IN ('mysql-monitor_connect_interval','mysql-monitor_ping_interval','mysql-monitor_read_only_interval');

Admin> UPDATE global_variables SET variable_value = 1000 where variable_name = 'mysql-monitor_connect_timeout';
Admin> UPDATE global_variables SET variable_value = 500 where variable_name = 'mysql-monitor_ping_timeout';

Admin> LOAD MYSQL VARIABLES TO RUNTIME;
Admin> SAVE MYSQL VARIABLES TO DISK;

Monitor module regularly check replication lag (using seconds_behind_master) if a server has max_replication_lag set to a non-zero value.

With below configuration, servers will only be shunned in case replication delay exceeds 60 seconds ( 1 min) behind master

Admin> UPDATE mysql_servers SET max_replication_lag=60;
Query OK, 1 row affected (0.00 sec)

If you want to be more accurate while calculating slave lag, kindly refer: http://proxysql.com/blog/proxysql-and-ptheartbeat

There are also other important things in monitoring module which we can configure as per our need. I will prefer writing separate blog in this series.

Configure Query Rules :

To send all SELECT queries on slave ( based on Regex ).

Admin> INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0);
Admin> INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1);
Admin> SELECT active, match_pattern, destination_hostgroup, apply FROM mysql_query_rules;

Admin> SELECT rule_id, match_digest,destination_hostgroup hg_id, apply FROM mysql_query_rules WHERE active=1;
+---------+----------------------+-------+-------+
| rule_id | match_digest         | hg_id | apply |
+---------+----------------------+-------+-------+
| 1       | ^SELECT .            | 1     | 0     |
| 2       | ^SELECT.*FOR UPDATE$ | 0     | 1     |
+---------+----------------------+-------+-------+

Admin> LOAD MYSQL QUERY RULES TO RUNTIME;
Admin> SAVE MYSQL QUERY RULES TO DISK;

When the Query Processor scans the query rules trying to find a match with no success and it reaches the end, it will apply the default_hostgroup for the specific user according to mysql_users entry.
In our case, user sysbench has a default_hostgroup=0, therefore any query not matching the above rules [Eg ALL WRITES ] will be sent to hostgroup 0 [Master].Below stats tables are used to validate if your query rules getting used by incoming traffic.

SELECT rule_id, hits, destination_hostgroup hg FROM mysql_query_rules NATURAL JOIN stats_mysql_query_rules;
+---------+-------+----+
| rule_id | hits  | hg |
+---------+-------+----+
| 1       | 17389 | 1  |
| 2       | 234   | 0  |
+---------+-------+----+

We can also redirect some specific pattern queries by using digest in stats_mysql_query_digest

Validate the DB Connection :

Application will connect to 6033 port on host 172.17.0.4 of ProxySQL to send DB traffic.

ProxySQL-Host$ mysql -u sysbench -psysbench -h 127.0.0.1 -P6033 -e "SELECT @@server_id"
+-------------+
| @@server_id |
+-------------+
|           2 |
+-------------+

Check Backend Status :

It shows ProxySQL is able to successfully connect to all backends.

mysql> select * from monitor.mysql_server_ping_log order by time_start_us desc limit 3;;
+------------+------+------------------+----------------------+------------+
| hostname   | port | time_start_us    | ping_success_time_us | ping_error |
+------------+------+------------------+----------------------+------------+
| 172.17.0.1 | 3306 | 1516795814170574 | 220                  | NULL       |
| 172.17.0.2 | 3306 | 1516795814167894 | 255                  | NULL       |
| 172.17.0.3 | 3306 | 1516795804170751 | 259                  | NULL       |
+------------+------+------------------+----------------------+------------+

I have executed some sysbench test on cluster to check query distributions .
Below table of ProxySQL shows number of queries executed per host.

Admin > select hostgroup,srv_host,status,Queries,Bytes_data_sent,Latency_us from stats_mysql_connection_pool where hostgroup in (0,1);
+-----------+------------+----------+---------+-----------------+------------+
| hostgroup | srv_host   | status   | Queries | Bytes_data_sent | Latency_us |
+-----------+------------+----------+---------+-----------------+------------+
| 0         | 172.17.0.1 | ONLINE   | 12349   | 76543232        | 144        |
| 1         | 172.17.0.2 | ONLINE   | 22135   | 87654356        | 190        |
| 1         | 172.17.0.3 | ONLINE   | 22969   | 85344235        | 110        |
| 1         | 172.17.0.1 | ONLINE   | 1672    | 4534332         | 144        |
+-----------+------------+----------+---------+-----------------+------------+

If any of your server goes unreachable from any hostgroup , status gets changed from ONLINE to SHUNNED.
It means ProxySQL wont send any queries to that host until it comes back to ONLINE.

We can also take any server offline for maintenance. To disable a backend server it is required to change its status to OFFLINE_SOFT (Gracefully disabling a backend server) or OFFLINE_HARD(Immediately disabling a backend server.)

In this case no new traffic will be send to the node.

Admin> UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostname='172.17.0.2';
Query OK, 1 row affected (0.00 sec)

Okay you still worried about reading stale data from slave ?

Then do not worry , ProxySQL is coming up with new feature to make sure your application get latest updated data. Currently this feature is available only with row based replication with GTID enabled.

 

For more detailed description on every module: https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration

Archiving MySQL Tables in ClickHouse

$
0
0
Archiving MySQL Tables in ClickHouse

Archiving MySQL Tables in ClickHouseIn this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a special purpose database that is optimized for storage (with compression) and reporting. The data is then available.

Actually, there are multiple use cases:

  • Sometimes the data just needs to be stored (i.e., for regulatory purposes) but does not have to be readily available (it’s not “customer facing” data)
  • The data might be useful for debugging or investigation (i.e., application or access logs)
  • In some cases, the data needs to be available for the customer (i.e., historical reports or bank transactions for the last six years)

In all of those cases, we can move the older data away from MySQL and load it into a “big data” solution. Even if the data needs to be available, we can still move it from the main MySQL server to another system. In this blog post, I will look at archiving MySQL tables in ClickHouse for long-term storage and real-time queries.

How To Archive?

Let’s say we have a 650G table that stores the history of all transactions, and we want to start archiving it. How can we approach this?

First, we will need to split this table into “old” and “new”. I assume that the table is not partitioned (partitioned tables are much easier to deal with). For example, if we have data from 2008 (ten years worth) but only need to store data from the last two months in the main MySQL environment, then deleting the old data would be challenging. So instead of deleting 99% of the data from a huge table, we can create a new table and load the newer data into that. Then rename (swap) the tables. The process might look like this:

  1. CREATE TABLE transactions_new LIKE transactions
  2. INSERT INTO transactions_new SELECT * FROM transactions WHERE trx_date > now() – interval 2 month
  3. RENAME TABLE transactions TO transactions_old, transactions_new TO transactions

Second, we need to move the transactions_old into ClickHouse. This is straightforward — we can pipe data from MySQL to ClickHouse directly. To demonstrate I will use the Wikipedia:Statistics project (a real log of all requests to Wikipedia pages).

Create a table in ClickHouse:

CREATE TABLE wikistat
(
    id bigint,
    dt DateTime,
    project String,
    subproject String,
    path String,
    hits UInt64,
    size UInt64
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(dt)
ORDER BY dt
Ok.
0 rows in set. Elapsed: 0.010 sec.

Please note that I’m using the new ClickHouse custom partitioning. It does not require that you create a separate date column to map the table in MySQL to the same table structure in ClickHouse

Now I can “pipe” data directly from MySQL to ClickHouse:

mysql --quick -h localhost wikistats -NBe
"SELECT concat(id,',"',dt,'","',project,'","',subproject,'","', path,'",',hits,',',size) FROM wikistats" |
clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT CSV"

Thirdwe need to set up a constant archiving process so that the data is removed from MySQL and transferred to ClickHouse. To do that we can use the “pt-archiver” tool (part of Percona Toolkit). In this case, we can first archive to a file and then load that file to ClickHouse. Here is the example:

Remove data from MySQL and load to a file (tsv):

pt-archiver --source h=localhost,D=wikistats,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.txt --bulk-delete --limit 100000 --progress=100000
TIME                ELAPSED   COUNT
2018-01-25T18:19:59       0       0
2018-01-25T18:20:08       8  100000
2018-01-25T18:20:17      18  200000
2018-01-25T18:20:26      27  300000
2018-01-25T18:20:36      36  400000
2018-01-25T18:20:45      45  500000
2018-01-25T18:20:54      54  600000
2018-01-25T18:21:03      64  700000
2018-01-25T18:21:13      73  800000
2018-01-25T18:21:23      83  900000
2018-01-25T18:21:32      93 1000000
2018-01-25T18:21:42     102 1100000
...

Load the file to ClickHouse:

cat load_to_clickhouse.txt | clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT TSV"

The newer version of pt-archiver can use a CSV format as well:

pt-archiver --source h=localhost,D=wikitest,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.csv --output-format csv --bulk-delete --limit 10000 --progress=10000

How Much Faster Is It?

Actually, it is much faster in ClickHouse. Even the queries that are based on index scans can be much slower in MySQL compared to ClickHouse.

For example, in MySQL just counting the number of rows for one year can take 34 seconds (index scan):

mysql> select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00';
+-----------+
| count(*)  |
+-----------+
| 103161991 |
+-----------+
1 row in set (34.82 sec)
mysql> explain select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wikistats
   partitions: NULL
         type: range
possible_keys: dt
          key: dt
      key_len: 6
          ref: NULL
         rows: 227206802
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

In ClickHouse, it only takes 0.062 sec:

:) select count(*) from wikistats where dt between  toDateTime('2017-01-01 00:00:00') and  toDateTime('2017-12-31 00:00:00');
SELECT count(*)
FROM wikistats
WHERE (dt >= toDateTime('2017-01-01 00:00:00')) AND (dt <= toDateTime('2017-12-31 00:00:00'))
┌───count()─┐
│ 103161991 │
└───────────┘
1 rows in set. Elapsed: 0.062 sec. Processed 103.16 million rows, 412.65 MB (1.67 billion rows/s., 6.68 GB/s.)

Size on Disk

In my previous blog on comparing ClickHouse to Apache Spark to MariaDB, I also compared disk size. Usually, we can expect a 10x to 5x decrease in disk size in ClickHouse due to compression. Wikipedia:Statistics, for example, contains actual URIs, which can be quite large due to the article name/search phrase. This can be compressed very well. If we use only integers or use MD5 / SHA1 hashes instead of storing actual URIs, we can expect much smaller compression (i.e., 3x). Even with a 3x compression ratio, it is still pretty good as long-term storage.

Conclusion

As the data in MySQL keeps growing, the performance for all the queries will keep decreasing. Typically, queries that originally took milliseconds can now take seconds (or more). That requires a lot of changes (code, MySQL, etc.) to make faster.

The main goal of archiving the data is to increase performance (“make MySQL fast again”), decrease costs and improve ease of maintenance (backup/restore, cloning the replication slave, etc.). Archiving to ClickHouse allows you to preserve old data and make it available for reports.

Exploring Amazon RDS Aurora: replica writes and cache chilling

$
0
0

Our clients operate on a variety of platforms, and RDS (Amazon Relational Database Service) Aurora has received quite a bit of attention in recent times. On behalf of our clients, we look beyond the marketing, and see what the technical architecture actually delivers.  We will address specific topics in individual posts, this time checking out what the Aurora architecture means for write and caching behaviour (and thus performance).

What is RDS Aurora?

First of all, let’s declare the baseline.  MySQL Aurora is not a completely new RDBMS. It comprises a set of Amazon modifications on top of stock Oracle MySQL 5.6 and 5.7, implementing a different replication mechanism and some other changes/additions.  While we have some information (for instance from the “deep dive” by AWS VP Anurag Gupta), the source code of the Aurora modifications are not published, so unfortunately it is not immediately clear how things are implemented.  Any architecture requires choices to be made, trade-offs, and naturally these have consequences.  Because we don’t get to look inside the “black box” directly, we need to explore indirectly.  We know how stock MySQL is architected, so by observing Aurora’s behaviour we can try to derive how it is different and what it might be doing.  Mind that this is equivalent to looking at a distant star, seeing a wobble, and deducing from the pattern that there must be one or more planets orbiting.  It’s an educated guess.

For the sake of brevity, I have to skip past some aspects that can be regarded as “obvious” to someone with insight into MySQL’s architecture.  I might also defer explaining a particular issue in depth to a dedicated post on that topic.  Nevertheless, please do feel free to ask “so why does this work in this way”, or other similar questions – that’ll help me check my logic trail and tune to the reader audience, as well as help create a clearer picture of the Aurora architecture.

Instead of using the binary log, Aurora replication ties into the storage layer.  It only supports InnoDB, and instead of doing disk reads/writes, the InnoDB I/O system talks to an Amazon storage API which delivers a shared/distributed storage, which can work across multiple availability zones (AZs).  Thus, a write on the master will appear on the storage system (which may or may not really be a filesystem).  Communication between AZs is fairly fast (only 2-3 ms extra overhead, relative to another server in the same AZ) so clustering databases or filesystems across AZs is entirely feasible, depending on the commit mechanism (a two-phase commit architecture would still be relatively slow).  We do multi-AZ clustering with Galera Cluster (Percona XtraDB Cluster or MariaDB Galera Cluster).  Going multi-AZ is a good idea that provides resilience beyond a single data centre.

So, imagine an individual instance in an Aurora setup as an EC2 (Amazon Elastic Computing) instance with MySQL using an SSD EBS (Amazon Elastic Block Storage) volume, where the InnoDB I/O threads interface more directly the the EBS API.  The actual architecture might be slightly different still (more on that in a later post), but this rough description helps set up a basic idea of what a node might look like.

Writes in MySQL

In a regular MySQL, on commit a few things happen:

  • the InnoDB log is written to and flushed,
  • the binary log is written to (and possibly flushed), and
  • the changed pages (data and indexes)  in the InnoDB buffer pool are marked dirty, so a background thread knows they need to be written back to disk (this does not need to happen immediately).  When a page is written to disk, normally it uses a “double-write” mechanism where first the original page is read and written to a scratch space, and then the new page is put in the original position.  Depending on the filesystem and underlying storage (spinning disk, or other storage with different block size from InnoDB page size) this may be required to be able to recover from write fails.

This does not translate in to as many IOPS because in practice, transaction commits are put together (for instance with MariaDB’s group commit) and thus many commits that happen in a short space effectively only use a few IOs for their log writes.  With Galera cluster, the local logs are written but not flushed, because the guaranteed durability is provided with other nodes in the cluster rather than local persistence of the logfile.

In Aurora, a commit has to send either the InnoDB log entries or the changed data pages to the storage layer; which one it is doesn’t particularly matter.  The storage layer has a “quorum set” mechanism to ensure that multiple nodes accept the new data.  This is similar to Galera’s “certification” mechanism that provides the “virtual synchrony”.  The Aurora “deep dive” talk claims that it requires many fewer IOPS for a commit; however, it appears they are comparing a worst-case plain MySQL scenario with an optimal Aurora environment.  Very marketing.

Aurora does not use the binary log, which does make one wonder about point-in-time recovery options. Of course, it is possible to recover to any point-in-time from an InnoDB snapshot + InnoDB transaction logs – this would require adding timestamps to the InnoDB transaction log format.

While it is noted that the InnoDB transaction log is also backed up to S3, it doesn’t appear to be used directly (so, only for recovery purposes then).  After all, any changed page needs to be communicated to the other instances, so essentially all pages are always flushed (no dirty pages).  When we look at the InnoDB stats GLOBAL STATUS, we sometimes do see up to a couple of dozen dirty pages with Aurora, but their existence or non-existence doesn’t appear to have any correlation with user-created tables and data.

Where InnoDB gets its Speed

InnoDB rows and indexing
InnoDB rows and indexing

We all know that disk-access is slow.  In order for InnoDB to be fast, it is dependent on most active data being in the buffer pool.  InnoDB does not care for local filesystem buffers – something is either in persistent storage, or in the buffer pool.  In configurations, we prefer direct I/O so the system calls that do the filesystem I/O bypass the filesystem buffers and any related overhead.  When a query is executed, any required page that’s not yet in the buffer pool is requested to be loaded in the background. Naturally, this does slow down queries, which is why we preferably want all necessary pages to already be in memory.  This applies for any type of query.  In InnoDB, all data/indexes are structured in B+trees, so an INSERT has to be merged into a page and possibly causes pages to be split and other items shuffled so as to “re-balance” the tree.  Similarly, a delete may cause page merges and a re-balancing operation.  This way the depth of the tree is controlled, so that even for a billion rows you would generally see a depth of no more than 6-8 pages.  That is, retrieving any row would only require a maximum of 6-8 page reads (potentially from disk).

I’m telling you all this, because while most replication and clustering mechanisms essentially work with the buffer pool, Aurora replication appears to works against it.  As I mentioned: choices have consequences (trade-offs).  So, what happens?

Aurora Replication

When you do a write in MySQL which gets replicated through classic asynchronous replication, the slaves or replica nodes affect the row changes in memory.  This means that all the data (which is stored with the PRIMARY KEY, in InnoDB) as well as any other indexes are updated, the InnoDB log is written, and the pages marked as dirty.  It’s very similar to what happens on the writer/master system, and thus the end result in memory is virtually identical.  While Galera’s cluster replication operates differently from the asynchronous mechanism shown in the diagram, the resulting caching (which pages are in memory) ends up similar.

MySQL Replication architecture
MySQL Replication architecture

Not so with Aurora.  Aurora replicates in the storage layer, so all pages are updated in the storage system but not in the in-memory InnoDB buffer pool.  A secondary notification system between the instances ensures that cached InnoDB pages are invalidated.  When you next do a query that needs any of those no-longer-valid cached pages, they will have to be be re-read from the storage system.  You can see a representation of this in the diagram below, indicating invalidated cache pages in different indexes; as shown, for INSERT operations, you’re likely to have pages higher up in the tree and one sideways page change as well because of the B+tree-rebalancing.

Aurora replicated insert
Aurora replicated insert

The Chilling Effect

We can tell the replica is reading from storage, because the same query is much slower than before we did the insert from the master instance.  Note: this wasn’t a matter of timing. Even if we waited slightly longer (to enable a possible background thread to refresh the pages) the post-insert query was just as slow.

Interestingly, the invalidation process does not actually remove them from the buffer pool (that is, the # of pages in the buffer pool does not go down); however, the # of page reads does not go up either when the page is clearly re-read.    Remember though that a status variable is just that, it has to be updated to be visible and it simply means that the new functions Amazon implemented don’t bother updating these status variables.  Accidental omission or purposeful obscurity?  Can’t say.  I will say that it’s very annoying when server statistics don’t reflect what’s actually going on, as it makes the stats (and their analysis) meaningless.  In this case, the picture looks better than it is.

With each Aurora write (insert/update/delete), the in-memory buffer pool on replicas is “chilled”.

Unfortunately, it’s not even just the one query on the replica that gets affected after a write. The primary key as well as the secondary indexes get chilled. If the initial query uses one particular secondary index, that index and the primary key will get warmed up again (at the cost of multiple storage system read operations), however the other secondary indexes are still chattering their teeth.

Being Fast on the Web

In web applications (whether websites or web-services for mobile apps), typically the most recently added data is the most likely to be read again soon.  This is why InnoDB’s buffer pool is normally very effective: frequently accessed pages remain in memory, while lesser used ones “age” and eventually get tossed out to make way for new pages.

Having caches clear due to a write, slows things down.  In the MySQL space, the fairly simply query cache is a good example.  Whenever you write to table A, any cached SELECTs that accesses table A are cleared out of the cache.  Regardless of whether the application is read-intensive, having regular writes makes the query cache useless and we turn it off in those cases.  Oracle has already deprecated the “good old” query cache (which was introduced in MySQL 4.0 in the early 2000s) and soon its code will be completely removed.

Conclusion

With InnoDB, you’d generally have an AUTO_INCREMENT PRIMARY KEY, and thus newly inserted rows are sequenced to that outer end of the B+Tree.  This also means that the next inserted row often ends up in the same page, again invalidating that recently written page on the replicas and slowing down reads of any of the rows it contained.

For secondary indexes, the effect is obviously scattered although if the indexed column is temporal (time-based), it will be similarly affected to the PRIMARY KEY.

How much all of this slows things down will very much depend on your application DB access profile.  The read/write ratio will matter little, but rather whether individual tables are written to fairly frequently.  If they do, SELECT queries on those tables made on replicas will suffer from the chill.

Aurora uses SSD EBS so of course the storage access is pretty fast.  However, memory is always faster, and we know that that’s important for web application performance.  And we can use similarly fast SSD storage on EC2 or another hosting provider, with mature scaling technologies such as Galera (or even regular asynchronous multi-threaded replication) that don’t give your caches the chills.

Using dbdeployer in CI tests

$
0
0

I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress).

The case for gh-ost

A user opened an issue on gh-ost, and the user was using MySQL 5.5. gh-ost is being tested on 5.7 where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be?

gh-ost tests

gh-ost has three different test types:

  • Unit tests: these are plain golang logic tests which are very easy and quick to run.
  • Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.
  • System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more.

Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What's the deal with the integration tests?

gh-ost integration tests

The gh-ost integration tests are a suite of scenarios which verify gh-ost's operation is sound. These scenarios are mostly concerned with data types, special alter statements etc. Is converting DATETIME to TIMESTAMP working properly? Are latin1 columns being updated correctly? How about renaming a column? Changing a PRIMARY KEY? Column reorder? 5.7 JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication...

The environment for these tests is a master-replica setup, where gh-ost modifies on the table on the replica and can then checksum or compare both the original and the altered ghost table.

We develop gh-ost internally at GitHub, but it's also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use Travis CI for the public facing tests.

To run gh-ost's integration tests as described above as part of our CI tests we should be able to:

  • Create a master/replica setup in CI.
  • Actually, create a master/replica setup in any CI, and namely in Travis CI.
  • Actually, create multiple master/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI.

I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, dbdeplyer stepped in.

dbdeployer

dbdeployer is a rewrite, a replacement to MySQL Sandbox. I've been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications:

  • Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it's fine, we can automate that.
  • Command line flag complexity: I always get lost in the complexity of the flags.
  • Get it right or prepare for battle: if you deployed something, but not the way you wanted, there's sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere.
  • Deploy, not remove. Adding a sandbox is one thing. How about removing it?

dbdeployer is a golang rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There's first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what's available. You can then create sandbox environments: replication, standalone, etc. You can then delete those.

It's pretty simple and I have not much more to add -- which is the best thing about it.

So, with dbdeployer it is easy to create a master/replica. Something like:

dbdeployer unpack path/to/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary ${PWD}/sandbox/binary
dbdeployer replication 5.7.21 --nodes 2 --sandbox-binary ${PWD}/sandbox/binary --sandbox-home ${PWD}/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW

Where does it all fit in, and what about the MySQL binaries though?

So, should dbdeployer be part of the gh-ost repo? And where does one get those MySQL binaries from? Are they to be part of the gh-ost repo? Aren't they a few GB to extract?

Neither dbdeployer nor MySQL binaries should be added to the gh-ost repo. And fortunately, Giuseppe also solved the MySQL binaries problem.

The scheme I'm looking at right now is as follows:

  • A new public repo, gh-ost-ci-env is created. This repo includes:
    • dbdeployer compiled binaries
    • Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time.
  • gh-ost's CI to git clone https://github.com/github/gh-ost-ci-env.git (code)
  • gh-ost's CI to setup a master/replica sandbox (one, two).
  • Kick the tests.

The above is a work in progress:

  • At this time only runs a single MySQL version.
  • There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures.

Another concern I have at this time is build time. For a single MySQL version, it takes some 5-7 minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably slower on Travis CI, I can expect between 10m - 15m. Add multiple versions and we're looking at a 1hr build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is.

That's all for now. I'm pretty excited for the potential of dbdeployer and will be looking into incorporating the same for orchestrator CI tests.

 

 

MySQL @ FOSSASIA, 2018

$
0
0
The FOSSASIA Summit is an annual conference on open technologies and society. The event offers lectures and workshops and various events on a multitude of topics.

MySQL-world's most popular open source database, has been registering its presence in FOSSASIA since 2014. Developers (like me), who play daily with MySQL source code, go there and share their knowledge about new features and various other topics related to MySQL.

Last year in 2017, we had a full day dedicated to MySQL talks/workshop which was attended by very serious users/customers of MySQL. We also had 'bird-of-a-feather' session in which customers/users got a chance to have in-depth discussion directly with developers on various topics of interest. And it was very much appreciated/enjoyed by both, customers/users as well as we developers.

This year, in 2018, again we are ready to register our presence in FOSSASIA, 2018 summit which is scheduled from 22nd March-25th March at Singapore. We have number of MySQL talks lined up where we'll speak about various MySQL topics.


# Topic Speaker Time (tentative)
1 Improved Error Logging in MySQL 8.0 Praveenkumar H 9:59 am
2 MySQL: Improving Connection Security Harin 10:29 am
3 MySQL Replication Performance Tuning Venkatesh 11 am
4 Enhanced XA Support for Replication in MySQL-5.7 Nisha PG 1:00 pm
5 The State of the Art on MySQL Group Replication Hemant Dangi 1:30 pm
6 What's new in MySQL Optimizer 8.0 Chaithra Gopalareddy 2:00 pm
7 Atomic DDL in MySQL 8.0 Shipra Jain 2:30 pm
8 MySQL for Distributed transaction and Usage of JSON as a fusion between SQL & NOSQL Ajo Robert 2:59 pm
9 MySQL Performance Schema - A great insight of running MySQL Server Mayank Prasad 3:30 pm
10 Histograms Amit 4:00 pm


All off the above talks are planned (tentative) on 25th March. So be there and get most out of MySQL @ FOSSASIA, 2018.

See you there. :-)

Conference details : https://2018.fossasia.org/
Complete schedule/Venue : TBD

[PS: Keep watching this space for more details. ]

How to install Apache, PHP 7.2 and MySQL on CentOS 7.4 (LAMP)

$
0
0
This tutorial shows how to install an Apache web server on a CentOS 7 server with PHP (mod_php with PHP 5.4, 7.0, 7.1, or 7.2) and MySQL support. This setup is often referred to as LAMP which stands for Linux - Apache - MySQL - PHP.

ProxySQL server version impersonation

$
0
0

Or Fun in using MySQL8 with ProxySQL and MysqlJ connector

 

I am recently working on testing MySQL8 and try the several solution attach to it,like ProxySQL but not only.

After I had setup the set of servers, and configured ProxySQL to redirect the incoming connection from my user m8_test to my MySQL8 setup, I had turn on my Java test application ... and with my surprise I received an error:

Caused by:

1
com.mysql.cj.core.exceptions.CJException: Unknown system variable 'query_cache_size'

 

Well ok MySQL8 doesn't have Query cache, but why I got this error?

I did point the application to MySQL8 directly and it worked fine.

 

Just to be sure this is something restricted to the Java connector, I did a test with a perl, and I was able to access and write my MySQL8 servers from ProxySQL without problem.

 

So the issue is restricted to MySQLJ and ProxySQL. Not the first time I have issue with that connector, and not the first time I see ProxySQL not 100% compatible. But this was weird.

I downloaded the latest MYSQLJ connector and put the source in my development environment.

 

Then I started to dig in to the issue.

MySQL Connector send a "SHOW VARIABLES" and then parse the result to "configure" the connector accordingly.

In the class MysqlaSession.loadServerVariables() is the method that will decide what variables should be included and what not.

The process is a bit rude and basic, with a series of IF condition checking the Server Version.

Finally at line 1044 I found the why the connector was failing:

1
2
3
4
5
6
7
8
9
10
11
12
13
if (versionMeetsMinimum(8, 0, 3)) {
 
queryBuf.append(", @@have_query_cache AS have_query_cache");
 
} else {
 
queryBuf.append(", @@query_cache_size AS query_cache_size");
 
queryBuf.append(", @@query_cache_type AS query_cache_type");
 
}


So if Version is at least 8.0.3 check for the variable have_query_cache, otherwise read query_cache_size and type.

Here we go, ProxySQL by default in version 1.4.6 declare itself as:

1
Server version:	5.5.30 (ProxySQL)

 

One of the good things in ProxySQL is that most of the important settings can be dynamically change, including the Server Version.

This is it, ProxySQL can impersonate whichever MySQL just modifying the Server Version variable.

 

Given that I did:

1
2
3
4
5
update global_variables set variable_value="8.0.4 (ProxySQL)" where variable_name='mysql-server_version';
load mysql variables to run;save mysql variables to disk;


At this point I had run my java app again, and all was running fine.

While there I tested several different scenarion and mostly worked as expected.

 

But once I set ProxySQL to impersonate a MySQL 5.5, yes right as 5.5 not as 5.5.x

Just to see if the connector was reading the version correctly. And with no big surprise... it was not.

 

Why? Bcause MySQL Connector once opened the channel with the server, reads some of the parameters directly from the connection, one of them is the Server Version.

The Server Version is parse in the class ServerVersion.parseVersion() method, and here the connector expect to find the server version following the standard major.sub.subminor (5.5.30) if this is not declare exactly like that, then the connector will just set the Server Version to 0.0.0. With the side effect that nothing will work correctly afterwards.

 

Conclusion

This short blog post was to share a simple issue I had and his resolution using the flexibility in ProxySQL to modify the declared MySQL Server Version.

Still, attention must be made given the MySQLJ is not flexible and standard (major.sub.subminor) must be used.

MariaDB Developer’s unconference & M|18

$
0
0

Been a while since I wrote anything MySQL/MariaDB related here, but there’s the column on the Percona blog, that has weekly updates.

Anyway, I’ll be at the developer’s unconference this weekend in NYC. Even managed to snag a session on the schedule, MySQL features missing in MariaDB Server (Sunday, 12.15–13.00). Signup on meetup?

Due to the prevalence of “VIP tickets”, I too signed up for M|18. If you need a discount code, I’ll happily offer them up to you to see if they still work (though I’m sure a quick Google will solve this problem for you). I’ll publish notes, probably in my weekly column.

If you’re in New York and want to say hi, talk shop, etc. don’t hesitate to drop me a line.

RDS Aurora MySQL Failover

$
0
0

Right now Aurora only allows a single master, with up to 15 read-only replicas.

Master/Replica Failover

We love testing failure scenarios, however our options for such tests with Aurora are limited (we might get back to that later).  Anyhow, we told the system, through the RDS Aurora dashboard, to do a failover. These were our observations:

Role Change Method

Both master and replica instances are actually restarted (the MySQL uptime resets to 0).

This is quite unusual these days, we can do a fully controlled role change in classic asynchronous replication without a restart (CHANGE MASTER TO …), and Galera doesn’t have read/write roles as such (all instances are technically writers) so it doesn’t need role changes at all.

Failover Timing

Failover between running instances takes about 30 seconds.  This is in line with information provided in the Aurora FAQ.

Failover where a new instance needs to be spun up takes 15 minutes according to the FAQ (similar to creating a new instance from the dash).

Instance Availability

During a failover operation, we observed that all connections to the (old) master, and the replica that is going to be promoted, are first dropped, then refused (the connection refusals will be during the period that the mysqld process is restarting).

According to the FAQ, reads to all replicas are interrupted during failover.  Don’t know why.

Aurora can deliver a DNS CNAME for your writer instance. In a controlled environment like Amazon, with guaranteed short TTL, this should work ok and be updated within the 30 seconds that the shortest possible failover scenario takes.  We didn’t test with the CNAME directly as we explicitly wanted to observe the “raw” failover time of the instances themselves, and the behaviour surrounding that process.

Caching State

On the promoted replica, the buffer pool is saved and loaded (warmed up) on the restart; good!  Note that this is not special, it’s desired and expected to happen: MySQL and MariaDB have had InnoDB buffer pool save/restore for years.  Credit: Jeremy Cole initially came up with the buffer pool save/restore idea.

On the old master (new replica/slave), the buffer pool is left cold (empty).  Don’t know why.  This was a controlled failover from a functional master.

Because of the server restart, other caches are of course cleared also.  I’m not too fussed about the query cache (although, deprecated as it is, it’s currently still commonly used), but losing connections is a nuisance. More detail on that later in this article.

Statistics

Because of the instance restarts, the running statistics (SHOW GLOBAL STATUS) are all reset to 0. This is annoying, but should not affect proper external stats gathering, other than for uptime.

On any replica, SHOW ENGINE INNODB STATUS comes up empty. Always.  This seems like obscurity to me, I don’t see a technical reason to not show it.  I suppose that with a replica being purely read-only, most running info is already available through SHOW GLOBAL STATUS LIKE ‘innodb%’, and you won’t get deadlocks on a read-only slave.

Multi-Master

Aurora MySQL multi-master was announced at Amazon re:Invent 2017, and appears to currently be in restricted beta test.  No date has been announced for general availability.

We’ll have to review it when it’s available, and see how it works in practice.

Conclusion

Requiring 30 seconds or more for a failover is unfortunate, this is much slower than other MySQL replication (writes can failover within a few seconds, and reads are not interrupted) and Galera cluster environments (which essentially delivers continuity across instance failures – clients talking to the failed instance will need to reconnect to the loadbalancer/cluster to continue).

I don’t understand why the old master gets a cold InnoDB buffer pool.

I wouldn’t think a complete server restart should be necessary, but since we don’t have insight in the internals, who knows.

On Killing Connections (through the restart)

Losing connections across an Aurora cluster is a real nuisance that really impacts applications.  Here’s why:

When MySQL C client library (which most MySQL APIs either use or are modelled on) is disconnected, it passes back a specific error to the application.  When the application makes its next query call, the C client will automatically reconnect first (so the client does not have to explicitly reconnect).  So a client only needs to catch the error and re-issue its last command, and all will generally be fine.  Of course, if it relies on different SESSION settings, or was in the middle of a multi-statement transaction, it will need to do a bit more.

So, this means that the application has to handle disconnects gracefully without chucking hissy-fits at users, and I know for a fact that that’s not how many (most?) applications are written.  Consequently, an Aurora failover will make the frontend of most applications look like a disaster zone for about 30 seconds (provided functional instances are available for the failover, which is the preferred and best case scenario).

I appreciate that this is not directly Aurora’s fault, it’s sloppy application development that causes this, but it’s a real-world fact we have to deal with.  And, perhaps importantly: other cluster and replication options do not trigger this scenario.

Meet dbdeployer: the new sandbox maker

$
0
0

How it happened


A few years ago I started thinking about refactoring MySQL-Sandbox. I got lots of ideas and a name for the project (dbdeployer) but went no further. The initial idea (this was 2013!) was to rewrite the project in Ruby: I had been using Ruby at work and it looked like a decent replacement for Perl. My main problem was the difficulty of installation in an uncontrolled environment. If you have control over your environment (it's your laptop or you are in charge of the server configuration via Puppet or similar) then the task is easy. But if you ever need to deploy somewhere with little or no notice, it becomes a problem: there are servers where Perl is not installed, and is common that the server also have a policy forbidding all scripting languages from being deployed. Soon I found out that Ruby has the same problem as Perl. In the meantime, my work also required heavy involvement with Python, and I started thinking that maybe it would be a better choice than Ruby.
My adventures with deployment continued. In some places, I would find old versions of Perl, Ruby, Python, and no way of replacing them easily. I also realized that, if I bit the bullet and wrote my tools in C or C++, my distribution problems would not end, as I had to deal with library dependencies and conflict with existing ones.
At the end of 2017 I finally did what I had postponed for so long: I took a serious look at Go, and I decided that it was the best candidate for solving the distribution problem. I had a few adjustment problems, as the Go philosophy is different from my previously used languages, but the advantages were so immediate that I was hooked. Here's what I found compelling:

  • Shift in responsibility: with all the other languages I have used, the user is responsible for providing the working environment, such as installing libraries, the language itself, solve conflicts, and so on, until the program can work. With Go, the responsibility is on the developers only: they are supposed to know how to collect the necessary packages and produce a sound executable. Users only need to download the executable and run it.
  • Ease of deployment. A Go executable doesn't have dependencies. Binaries can be compiled for several platforms from a single origin (I can build Linux executables in my Mac and vice versa) and they just work.
  • Ease of development. Go is a strongly typed language, and has a different approach at code structure than Perl or Python. But this doesn't slow down my coding: it forces me to write better code, resulting in something that is at the same time more robust and easy to extend.
  • Wealth of packages. Go has an amazingly active community, and there is an enormous amount of packages ready for anything.

What is dbdeployer?


The first goal of dbdeployer is to replace MySQL-Sandbox completely. As such, it has all the main features of MySQL Sandbox, and many more (See the full list of features at the end of this text.)

You can deploy a single sandbox, or multiple unrelated sandboxes, or several servers in replication. That you could do also with MySQL-Sandbox. The first difference is in the command structure:

$ dbdeployer
dbdeployer makes MySQL server installation an easy task.
Runs single, multiple, and replicated sandboxes.

Usage:
dbdeployer [command]

Available Commands:
admin administrative tasks
delete delete an installed sandbox
global Runs a given command in every sandbox
help Help about any command
multiple create multiple sandbox
replication create replication sandbox
sandboxes List installed sandboxes
single deploys a single sandbox
templates Admin operations on templates
unpack unpack a tarball into the binary directory
usage Shows usage of installed sandboxes
versions List available versions

Flags:
--base-port int Overrides default base-port (for multiple sandboxes)
--bind-address string defines the database bind-address (default "127.0.0.1")
--config string configuration file (default "$HOME/.dbdeployer/config.json")
--custom-mysqld string Uses an alternative mysqld (must be in the same directory as regular mysqld)
-p, --db-password string database password (default "msandbox")
-u, --db-user string database user (default "msandbox")
--expose-dd-tables In MySQL 8.0+ shows data dictionary tables
--force If a destination sandbox already exists, it will be overwritten
--gtid enables GTID
-h, --help help for dbdeployer
-i, --init-options strings mysqld options to run during initialization
--keep-auth-plugin in 8.0.4+, does not change the auth plugin
--keep-server-uuid Does not change the server UUID
--my-cnf-file string Alternative source file for my.sandbox.cnf
-c, --my-cnf-options strings mysqld options to add to my.sandbox.cnf
--port int Overrides default port
--post-grants-sql strings SQL queries to run after loading grants
--post-grants-sql-file string SQL file to run after loading grants
--pre-grants-sql strings SQL queries to run before loading grants
--pre-grants-sql-file string SQL file to run before loading grants
--remote-access string defines the database access (default "127.%")
--rpl-password string replication password (default "rsandbox")
--rpl-user string replication user (default "rsandbox")
--sandbox-binary string Binary repository (default "$HOME/opt/mysql")
--sandbox-directory string Changes the default sandbox directory
--sandbox-home string Sandbox deployment direcory (default "$HOME/sandboxes")
--skip-load-grants Does not load the grants
--use-template strings [template_name:file_name] Replace existing template with one from file
--version version for dbdeployer

Use "dbdeployer [command] --help" for more information about a command.

MySQL-Sandbox was created in 2006, and its structure changed as needed, without a real plan. dbdeployer, instead, was designed to have a hierarchical command structure, similar to git or docker, to give users a better feeling. As a result, it has a leaner set of commands, a non-awkward way of using options, and offers a better control of the operations out of the box.

For example, here's how we would start to run sandboxes:

$ dbdeployer --unpack-version=8.0.4 unpack mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz
Unpacking tarball mysql-8.0.4-rc-linux-glibc2.12-x86_64.tar.gz to $HOME/opt/mysql/8.0.4
.........100.........200.........292

The first (mandatory) operation is to expand binaries from a tarball. By default, the files will be expanded to $HOME/opt/mysql. Once this is done, we can create sandboxes at will, with simple commands:

$ dbdeployer single 8.0.4
Database installed in $HOME/sandboxes/msb_8_0_4
run 'dbdeployer usage single' for basic instructions'
. sandbox server started

$ dbdeployer replication 8.0.4
[...]
Replication directory installed in /$HOME/sandboxes/rsandbox_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer multiple 8.0.4
[...]
Multiple directory installed in $HOME/sandboxes/multi_msb_8_0_4
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer sandboxes
msb_8_0_4 : single 8.0.4 [8004]
multi_msb_8_0_4 : multiple 8.0.4 [24406 24407 24408]
rsandbox_8_0_4 : master-slave 8.0.4 [19405 19406 19407]

Three differences between dbdeployer and MySQL-Sandbox:

  • There is only one executable, with different commands;
  • After each deployment, there is a suggestion on how to get help about the sandbox usage.
  • There is a command that displays which sandboxes were installed, the kind of deployment, and the ports in use. This will be useful when the ports increase, as in group replication.

Here's another take, after deploying group replication:

$ dbdeployer sandboxes
group_msb_8_0_4 : group-multi-primary 8.0.4 [20405 20530 20406 20531 20407 20532]
group_sp_msb_8_0_4 : group-single-primary 8.0.4 [21405 21530 21406 21531 21407 21532]
msb_8_0_4 : single 8.0.4 [8004]
multi_msb_8_0_4 : multiple 8.0.4 [24406 24407 24408]
rsandbox_8_0_4 : master-slave 8.0.4 [19405 19406 19407]

A few more differences from MySQL-Sandbox are the "global" and "delete" commands.
The "global" command can broadcast a command to all the sandboxes. You can start, stop, restart all sandboxes at once, or run a query everywhere.

$ dbdeployer global use "select @@server_id, @@port, @@server_uuid"
# Running "use_all" on group_msb_8_0_4
# server: 1
@@server_id @@port @@server_uuid
100 20405 00020405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port @@server_uuid
200 20406 00020406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port @@server_uuid
300 20407 00020407-3333-3333-3333-333333333333

# Running "use_all" on group_sp_msb_8_0_4
# server: 1
@@server_id @@port @@server_uuid
100 21405 00021405-1111-1111-1111-111111111111
# server: 2
@@server_id @@port @@server_uuid
200 21406 00021406-2222-2222-2222-222222222222
# server: 3
@@server_id @@port @@server_uuid
300 21407 00021407-3333-3333-3333-333333333333

# Running "use" on msb_8_0_4
@@server_id @@port @@server_uuid
1 8004 00008004-0000-0000-0000-000000008004
[...]

You can run the commands manually. dbdeployer usage will show which commands are available for every sandbox.

$ dbdeployer usage single

USING A SANDBOX

Change directory to the newly created one (default: $SANDBOX_HOME/msb_VERSION
for single sandboxes)
[ $SANDBOX_HOME = $HOME/sandboxes unless modified with flag --sandbox-home ]

The sandbox directory of the instance you just created contains some handy
scripts to manage your server easily and in isolation.

"./start", "./status", "./restart", and "./stop" do what their name suggests.
start and restart accept parameters that are eventually passed to the server.
e.g.:

./start --server-id=1001

./restart --event-scheduler=disabled

"./use" calls the command line client with the appropriate parameters,
Example:

./use -BN -e "select @@server_id"
./use -u root

"./clear" stops the server and removes everything from the data directory,
letting you ready to start from scratch. (Warning! It's irreversible!)

When you don't need the sandboxes anymore, you can dismiss them with a single command:

$ dbdeployer delete ALL
Deleting the following sandboxes
$HOME/sandboxes/group_msb_8_0_4
$HOME/sandboxes/group_sp_msb_8_0_4
$HOME/sandboxes/msb_8_0_4
$HOME/sandboxes/multi_msb_8_0_4
$HOME/sandboxes/rsandbox_8_0_4
Do you confirm? y/[N]

There is an option to skip the confirmation, which is useful for scripting unattended tests.


Customization


One of the biggest problems with MySQL-Sandbox was that most of the functioning is hard-coded, and the scripts needed to run the sandboxes are generated in different places, so that extending or modifying features became more and more difficult. When I designed dbdeployer, I gave myself the goal of making the tool easy to change, and the code easy to understand and extend.

For this reason, I organized everything related to code generation (the scripts that initialize and run the sandboxes) in a collection of templates and default variables that are publicly visible and modifiable.

$ dbdeployer templates -h
The commands in this section show the templates used
to create and manipulate sandboxes.

Usage:
dbdeployer templates [command]

Aliases:
templates, template, tmpl, templ

Available Commands:
describe Describe a given template
export Exports all templates to a directory
import imports all templates from a directory
list list available templates
reset Removes all template files
show Show a given template

You can list the templates on the screen.

$ dbdeployer templates list single
[single] replication_options : Replication options for my.cnf
[single] load_grants_template : Loads the grants defined for the sandbox
[single] grants_template57 : Grants for sandboxes from 5.7+
[single] grants_template5x : Grants for sandboxes up to 5.6
[single] my_template : Prefix script to run every my* command line tool
[single] show_binlog_template : Shows a binlog for a single sandbox
[single] use_template : Invokes the MySQL client with the appropriate options
[single] clear_template : Remove all data from a single sandbox
[single] restart_template : Restarts the database (with optional mysqld arguments)
[single] start_template : starts the database in a single sandbox (with optional mysqld arguments)
[single] stop_template : Stops a database in a single sandbox
[single] send_kill_template : Sends a kill signal to the database
[single] show_relaylog_template : Show the relaylog for a single sandbox
[single] Copyright : Copyright for every sandbox script
[single] expose_dd_tables : Commands needed to enable data dictionary table usage
[single] init_db_template : Initialization template for the database
[single] grants_template8x : Grants for sandboxes from 8.0+
[single] add_option_template : Adds options to the my.sandbox.cnf file and restarts
[single] test_sb_template : Tests basic sandbox functionality
[single] sb_include_template : TBD
[single] gtid_options : GTID options for my.cnf
[single] my_cnf_template : Default options file for a sandbox
[single] status_template : Shows the status of a single sandbox

Then it's possible to examine template contents:

$ dbdeployer templates describe --with-contents init_db_template
# Collection : single
# Name : init_db_template
# Description : Initialization template for the database
# Notes : This should normally run only once
# Length : 656
##START init_db_template
#!/bin/bash
{{.Copyright}}
# Generated by dbdeployer {{.AppVersion}} using {{.TemplateName}} on {{.DateTime}}
BASEDIR={{.Basedir}}
export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
SBDIR={{.SandboxDir}}
DATADIR=$SBDIR/data
cd $SBDIR
if [ -d $DATADIR/mysql ]
then
echo "Initialization already done."
echo "This script should run only once."
exit 0
fi

{{.InitScript}} \
{{.InitDefaults}} \
--user={{.OsUser}} \
--basedir=$BASEDIR \
--datadir=$DATADIR \
--tmpdir={{.Tmpdir}} {{.ExtraInitFlags}}

##END init_db_template

The one above is the template that generates the initialization script. In MySQL-Sandbox, this was handled in the code, and it was difficult to figure out what went wrong when the initialization failed. The Go language has an excellent support for code generation using templates, and with just a fraction of its features I implemented a few dozen scripts which I am able to modify with ease. Here's what the deployed script looks like

#!/bin/bash

# DBDeployer - The MySQL Sandbox
# Copyright (C) 2006-2018 Giuseppe Maxia
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Generated by dbdeployer 0.1.24 using init_db_template on Tue Feb 20 14:45:29 CET 2018
BASEDIR=$HOME/opt/mysql/8.0.4
export LD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$BASEDIR/lib:$BASEDIR/lib/mysql:$DYLD_LIBRARY_PATH
SBDIR=$HOME/sandboxes/msb_8_0_4
DATADIR=$SBDIR/data
cd $SBDIR
if [ -d $DATADIR/mysql ]
then
echo "Initialization already done."
echo "This script should run only once."
exit 0
fi

$HOME/opt/mysql/8.0.4/bin/mysqld \
--no-defaults \
--user=$USER \
--basedir=$BASEDIR \
--datadir=$DATADIR \
--tmpdir=$HOME/sandboxes/msb_8_0_4/tmp \
--initialize-insecure --default_authentication_plugin=mysql_native_password

Let's see the quick-and-dirty usage. If you want to change a template and use it just once, do the following:

  1. $ dbdeployer templates show init_db_template
  2. Save it to a file init_db.txt and edit it. Be careful, though: removing or altering essential labels may block the sandbox initialization.
  3. Use the template file in the next command:

$ dbdeployer single 8.0.4 --use-template=init_db_template:init_db.txt

For more permanent results, when you'd like to change a template, or several ones, permanently, you can use the export/import commands


  1. List the templates related to replication (dbdeployer templates list replication)
  2. Export the templates to the directory "mydir" $ dbdeployer templates export replication mydir
  3. edit the templates you want to change inside "mydir/replication"
  4. Import the templates dbdeployer templates import replication mydir

The templates will end inside $HOME/.dbdeployer/templates_$DBDEPLOYER_VERSION and dbdeployer will load then instead of using the ones stored internally. The next time that one of those templates will be needed, it will be collected from the file. If you run dbdeployer templates list or describe, the ones saved to file will be marked with {F}.
To go back to the built-in behavior, simply run dbdeployer templates reset

In addition to templates, dbdeployer uses a set of values when creating sandboxes. Like templates, this set is used from internal store, but it can be exported to a configuration file.

$ dbdeployer admin show
# Internal values:
{
"version": "0.1.24",
"sandbox-home": "$HOME/sandboxes",
"sandbox-binary": "$HOME/opt/mysql",
"master-slave-base-port": 11000,
"group-replication-base-port": 12000,
"group-replication-sp-base-port": 13000,
"multiple-base-port": 16000,
"group-port-delta": 125,
"sandbox-prefix": "msb_",
"master-slave-prefix": "rsandbox_",
"group-prefix": "group_msb_",
"group-sp-prefix": "group_sp_msb_",
"multiple-prefix": "multi_msb_"
}

The values named *-base-port are used to calculate the port for each node in a multiple deployment. The calculation goes:

sandbox_port + base_port + (revision_number * 100)

So, for example, when deploying replication for 5.7.21, the sandbox port would be 5721, and the final base port will be calculated as follows:

5721 + 11000 + 21 * 100 = 18821

This number will be incremented for each node in the cluster, so that the master will get 18822, and the first slave 18823.

Using the commands dbdeployer admin export and import you can customize the default values in a way similar to what we saw for the templates.


Thanks


I'd like to thank:


A note about unpacking MySQL tarball

When using MySQL tarballs, we may have some problems due to the enormous size that the tarballs have reached. Look at this:

690M    5.5.52
1.2G 5.6.39
2.5G 5.7.21
3.6G 8.0.0
1.3G 8.0.1
1.5G 8.0.2
1.9G 8.0.3
1.9G 8.0.4

This becomes a serious problem when you want to unpack the tarball inside a low-resource virtual machine or a Docker container. I have asked the MySQL team to provide reduced tarballs, possibly in a fixed location, so that sandboxes creation could be fully automated. I was told that something will be done soon. In the meantime, I provide such reduced tarballs, which have a more reasonable size:

 49M    5.5.52
61M 5.6.39
346M 5.7.21
447M 8.0.0
462M 8.0.1
254M 8.0.2
270M 8.0.3
244M 8.0.4

Using these reduced tarballs, which are conveniently packed in a docker container (datacharmer/mysql-sb-full contains all major MySQL versions), I have automated dbdeployer tests with minimal storage involvement, and that improves the test speed as well.

Detailed list of features


Feature MySQL-Sandbox dbdeployer dbdeployer planned
Single sandbox deployment yes yes
unpack command sort of 1 yes
multiple sandboxes yes yes
master-slave replication yes yes
"force" flag yes yes
pre-post grants SQL action yes yes
initialization options yes yes
my.cnf options yes yes
custom my.cnf yes yes
friendly UUID generation yes yes
global commands yes yes
test replication flow yes yes
delete command yes 2 yes
group replication SP no yes
group replication MP no yes
prevent port collision no yes 3
visible initialization no yes 4
visible script templates no yes 5
replaceable templates no yes 6
configurable defaults no yes 7
list of source binaries no yes 8
list of installed sandboxes no yes 9
test script per sandbox no yes 10
integrated usage help no yes 11
custom abbreviations no yes 12
version flag no yes 13
fan-in no no yes 14
all-masters no no yes 15
Galera/PXC/NDB no no yes 18
finding free ports yes no yes
pre-post grants shell action yes no maybe
getting remote tarballs yes no yes
circular replication yes no no 16
master-master (circular) yes no no
Windows support no no no 17


  1. It's achieved using --export_binaries and then abandoning the operation. 
  2. Uses the sbtool command 
  3. dbdeployer sandboxes store their ports in a description JSON file, which allows the tool to get a list of used ports and act before a conflict happens. 
  4. The initialization happens with a script that is generated and stored in the sandbox itself. Users can inspect the init_db script and see what was executed. 
  5. All sandbox scripts are generated using templates, which can be examined and eventually changed and re-imported. 
  6. See also note 5. Using the flag --use-template you can replace an existing template on-the-fly. Group of templates can be exported and imported after editing. 
  7. Defaults can be exported to file, and eventually re-imported after editing.  
  8. This is little more than using an O.S. file listing, with the added awareness of the source directory. 
  9. Using the description files, this command lists the sandboxes with their topology and used ports. 
  10. It's a basic test that checks whether the sandbox is running and is using the expected port. 
  11. The "usage" command will show basic commands for single and multiple sandboxes. 
  12. The abbreviations file allows user to define custom shortcuts for frequently used commands. 
  13. Strangely enough, this simple feature was never implemented for MySQL-Sandbox, while it was one of the first additions to dbdeployer. 
  14. Will use the multi source technology introduced in MySQL 5.7. 
  15. Same as n. 13. 
  16. Circular replication should not be used anymore. There are enough good alternatives (multi-source, group replication) to avoid this old technology. 
  17. I don't do Windows, but you can fork the project if you do. 
  18. For Galera/PXC and MySQL Cluster I have ideas, but I may need help to implement. 

Updated: Become a ClusterControl DBA - SSL Key Management and Encryption of MySQL Data in Transit

$
0
0

Databases usually work in a secure environment. It may be a datacenter with a dedicated VLAN for database traffic. It may be a VPC in EC2. If your network spreads across multiple datacenters in different regions, you’d usually use some kind of Virtual Private Network or SSH tunneling to connect these locations in a secure manner. With data privacy and security being hot topics these days, you might feel better with an additional layer of security.

MySQL supports SSL as a means to encrypt traffic both between MySQL servers (replication) and between MySQL servers and clients. If you use Galera cluster, similar features are available - both intra-cluster communication and connections with clients can be encrypted using SSL.

A common way of implementing SSL encryption is to use self-signed certificates. Most of the time, it is not necessary to purchase an SSL certificate issued by the Certificate Authority. Anybody who’s been through the process of generating a self-signed certificate will probably agree that it is not the most straightforward process - most of the time, you end up searching through the internet to find howto’s and instructions on how to do this. This is especially true if you are a DBA and only go through this process every few months or even years. This is why we added a ClusterControl feature to help you manage SSL keys across your database cluster. In this blog post, we’ll be making use of ClusterControl 1.5.1.

Key Management in the ClusterControl

You can enter Key Management by going to Side Menu -> Key Management section.

You will be presented with the following screen:

You can see two certificates generated, one being a CA and the other one a regular certificate. To generate more certificates, switch to the ‘Generate Key’ tab:

A certificate can be generated in two ways - you can first create a self-signed CA and then use it to sign a certificate. Or you can go directly to the ‘Client/Server Certificates and Key’ tab and create a certificate. The required CA will be created for you in the background. Last but not least, you can import an existing certificate (for example a certificate you bought from one of many companies which sell SSL certificates).

To do that, you should upload your certificate, key and CA to your ClusterControl node and store them in /var/lib/cmon/ca directory. Then you fill in the paths to those files and the certificate will be imported.

If you decided to generate a CA or generate a new certificate, there’s another form to fill - you need to pass details about your organization, common name, email, pick the key length and expiration date.

Once you have everything in place, you can start using your new certificates. ClusterControl currently supports deployment of SSL encryption between clients and MySQL databases and SSL encryption of intra-cluster traffic in Galera Cluster. We plan to extend the variety of supported deployments in future releases of the ClusterControl.

Full SSL encryption for Galera Cluster

Now let’s assume we have our SSL keys ready and we have a Galera Cluster, which needs SSL encryption, deployed through our ClusterControl instance. We can easily secure it in two steps.

First - encrypt Galera traffic using SSL. From your cluster view, one of the cluster actions is 'Enable SSL Galera Encryption'. You’ll be presented with the following options:

If you do not have a certificate, you can generate it here. But if you already generated or imported an SSL certificate, you should be able to see it in the list and use it to encrypt Galera replication traffic. Please keep in mind that this operation requires a cluster restart - all nodes will have to stop at the same time, apply config changes and then restart. Before you proceed here, make sure you are prepared for some downtime while the cluster restarts.

Once intra-cluster traffic has been secured, we want to cover client-server connections. To do that, pick ‘Enable SSL Encryption’ job and you’ll see following dialog:

It’s pretty similar - you can either pick an existing certificate or generate new one. The main difference is that to apply client-server encryption, downtime is not required - a rolling restart will suffice. Once restarted, you will find a lock icon right under the encrypted host on the Overview page:

The label 'Galera' means Galera encryption is enabled, while 'SSL' means client-server encryption is enabled for that particular host.

Of course, enabling SSL on the database is not enough - you have to copy certificates to clients which are supposed to use SSL to connect to the database. All certificates can be found in /var/lib/cmon/ca directory on the ClusterControl node. You also have to remember to change grants for users and make sure you’ve added REQUIRE SSL to them if you want to enforce only secure connections.

We hope you’ll find those options easy to use and help you secure your MySQL environment. If you have any questions or suggestions regarding this feature, we’d love to hear from you.

Understand Your Prometheus Exporters with Percona Monitoring and Management (PMM)

$
0
0
Prometheus Exporters 2 small

In this blog post, I will look at the new dashboards in Percona Monitoring and Management (PMM) for Prometheus exporters.

Percona Monitoring and Management (PMM) uses Prometheus exporters to capture metrics data from the system it monitors. Those Prometheus exporters are an important part of your monitoring infrastructure, and understanding their performance and other operational details is critical for well-implemented monitoring.    

To help you with this we’ve added a number of new dashboards to Percona Monitoring and Management.

The Prometheus Exporters Overview dashboard provides a high-level overview of your installed Prometheus exporter infrastructure:

Prometheus Exporters

The summary shows you how many hosts are monitored and how many exporters you have running, as well as how much CPU and memory they are using.

Note that the CPU usage shown in this graph is only the CPU usage of the exporter itself. It does not include the additional resource usage that is required to produce metrics by the application or operating system.

Next, we have an overview of resource usage by the host:  

Prometheus Exporters 2

Prometheus Exporters 3

These graphs allow us to analyze the resource usage for different hosts, allowing us to clearly see if any of the hosts have unusually high CPU or memory usage by exporters.

You may notice some of the CPU usage reported on these graphs is very high. This is due to the fact that we use very high-resolution sampling and very underpowered instances for this demonstration environment. CPU usage numbers like this are not typical.

The next graphs show resource usage by the type of exporter:

Prometheus Exporters 4

Prometheus Exporters 5

In this case, we measure CPU usage in “CPU Cores” rather than as a percent – it is more meaningful. Otherwise, the same amount of actual resource usage by the exporter will look very different on a system with one core versus a system with 64 cores. Core usage numbers have a pretty stable baseline, though.

Then there is a list of your monitored hosts and the exporters they are running:

Prometheus Exporters 6

This shows your CPU usage and memory usage per host, as well as the number of exporters running and system details.

You can click on a host to get to the System Overview, or jump to Prometheus Exporter Status dashboard.

Prometheus Exporter Status dashboard allows you to investigate how specific exporters are performing for the given host. Each of the well-known exporters has its own row in this dashboard.

Node Exporter Status shows us the resource usage, uptime and performance of Node Exporter (the exporter responsible for capturing OS-level metrics):   

Prometheus Exporters 7

Prometheus Exporters 8

The “Collector Scrape Successful” shows which node_exporter collector category (which are modules that collect specific information) have returned data reliably. If you have anything but a flat line on “1” here, you need to check for problems.

“Collector Execution Time” shows how long on average it takes to execute your enabled collectors. This shows which collectors are generally more expensive to run (or if some of them are experiencing performance problems).

MySQL Exporter Status shows us how MySQL exporter is performing:

Prometheus Exporters 9

Additionally, in resource usage we see the rate of scrapes for High, Medium and Low resolution data.

Generally, you should see three flat lines here if everything is working well. This is not the case for this host, and we can see some scrapes are not successful – either failing to complete, or not triggered by Prometheus Server altogether (due to overload or connectivity issues).

Prometheus Exporters 10

These graphs provide information about MySQL Exporter Errors – permission errors and other issues. It also shows if MySQL Server was up during this time. There are also similar details reported for MongoDB and ProxySQL exporters if they are running on the host.

I hope these new dashboards help you to understand your Prometheus exporter performance better!

Percona Live 2018 Open Source Database Conference Full Schedule Now Available

$
0
0
Percona Live 2018 Featured Talk

Percona Live 2018The conference session schedule for the seventh annual Percona Live 2018 Open Source Database Conference, taking place April 23-25 at the Santa Clara Convention Center in Santa Clara, CA is now live and available for review! Advance Registration Discounts can be purchased through March 4, 2018, 11:30 p.m. PST.

Percona Live Open Source Database Conference 2018 is the premier open source database event. With a theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to start learning how to use and operate open source databases.

Major areas of focus at the conference include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB Landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

Conference Session Schedule

Conference sessions take place April 24-25 and will feature 90+ in-depth talks by industry experts related to each of the key areas. Several sessions from Oracle and Percona will focus on how the new features and enhancements in the upcoming release of MySQL 8.0 will impact businesses. Conference session examples include:

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Platinum – Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, SolarWinds, Timescale, TwinDB, Yelp
  • Other Sponsors – cPanel
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference is held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!


Percona XtraDB Cluster and SELinux: Getting It To Work

$
0
0
Percona XtraDB Cluster and SELinux

Percona XtraDB Cluster and SELinuxIn this blog post, I’ll look at how to make Percona XtraDB Cluster and SELinux work when used together.

Recently, I encountered an issue with Percona XtraDB Cluster startup. We tried to setup a three-node cluster using Percona XtraDB Cluster with a Vagrant CentOS box, but somehow node2 was not starting. I did not get enough information to debug the issue in the donor/joiner error log. I got only the following error message:

2018-02-08 16:58:48 7910 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.100.20' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin' '
2018-02-08 16:58:48 7910 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.100.20' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin'
 Read: '(null)'
2018-02-08 16:58:48 7910 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.100.20' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin' : 2 (No such file or directory)
2018-02-08 16:58:48 7910 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-02-08 16:58:48 7910 [ERROR] Aborting
2018-02-08 16:58:50 7910 [Note] WSREP: Closing send monitor...

The donor node error log also failed to give any information to debug the issue. After spending a few hours on the problem, one of our developers (Krunal) found that the error is due to SELinux. By default, SELinux is enabled in Vagrant CentOS boxes.

We have already documented how to disable SELinux when installing Percona XtraDB Cluster. Since we did not find any SELinux related error in the error log, we had to spend few hours finding out the root cause

You should also disable SELinux on the donor node to start the joiner node. Otherwise, the SST script starts but startup will fail with this error:

2018-02-09T06:55:06.099021Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.100.20:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1')
2018-02-09T06:55:06.099556Z 2 [Note] WSREP: DONOR thread signaled with 0
2018-02-09T06:55:06.099722Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.100.20:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1': 2 (No such file or directory)
2018-02-09T06:55:06.099781Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.100.20:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1'

Disable SELinux on all nodes to start Percona XtraDB Cluster.

The Percona XtraDB Cluster development team is working on providing the proper error message for SELinux issues.

MariaDB Connector/J 2.2.2 and 1.7.2 now available

$
0
0
MariaDB Connector/J 2.2.2 and 1.7.2 now available dbart Wed, 02/21/2018 - 11:40

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.2 and MariaDB Connector/J 1.7.2. See the release notes and changelogs for details and visit mariadb.com/downloads/connector to download.

Download MariaDB Connector/J 2.2.2

Release Notes Changelog About MariaDB Connector/J


Download MariaDB Connector/J 1.7.2

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.2 and MariaDB Connector/J 1.7.2. See the release notes and changelogs for details.

Login or Register to post comments

Welcome to Slack!

$
0
0

Open source is at the foundation of MySQL and the biggest and best part of open source is the legion of developers and users who use and contribute to our great product.  It has always been of incredible importance to us to interact with our friends in MySQL space and one of the great ways of doing that is via IRC (Internet Relay Chat) on #freenode.  While that still remains a great option many other systems have been developed that offer other advantages.  One of those is slack.

We wanted to create a slack space where our users could hang out, help each other, and interact with MySQL developers.  We’re in no way wanting to replace IRC but just wanting to make it even easier to solve your MySQL problems and learn about the many great things we are working on.

Head on over to http://mysqlcommunity.slack.com to join in on the fun!

Scale with Maxscale part-4 (Amazon Aurora)

$
0
0

This is part-4 of the Maxscale Blog series

  1. Maxscale and Galera
  2. Maxscale Basic Administration
  3. Maxscale for Replication

Maxscale started supporting Amazon Aurora lately from its version 2.1 which comes with a BSL license, we are good until we use only 3 nodes, Amazon Aurora (Our Previous blog ) is a brilliant technology built by AWS team which imitates features of MySQL, Aurora is getting better and better with regards to scaling and feature with each release of its Aurora engine current version is 1.16 (at time of writing ) , Aurora architecture and features can be seen here. In this blog i will be explaining Maxscale deployment for Aurora.

Maxscale version : maxscale-2.1.13-1.x86_64
OS version              : Amazon Linux AMI 2016.09
Cores                        : 4
RAM                         : 8GB

Note: Make sure to have the EC2 machine with in same Availability Zone ( AZ ) where the Aurora resides which could greatly help in reducing the network latency.

For the purpose of this blog i have used 3 instances of Aurora (1Master + 2 Read Replica).

Now lets speaks about the end-points that Aurora provides.

Aurora Endpoints:

Endpoints are the correction URI’s provide by AWS to connect to the Aurora database. Listed below the endpoints provided by Aurora.

  •  Cluster Endpoint
  •  Reader Endpoint
  •  Instance Endpoint

Cluster Endpoint:

An endpoint for a Aurora DB cluster that connects to the current primary instance for that DB cluster,The cluster endpoint provides failover support for read/write connections to the DB cluster. If the current primary instance of a DB cluster fails, Aurora automatically fails over to a new primary instance

Reader Endpoint:

An endpoint for an Aurora DB cluster that connects to one of the available Aurora Replicas for that DB cluster. Each Aurora DB cluster has a reader endpoint.The reader endpoint provides load balancing support for read-only connections to the DB cluster

Instance Endpoint:

An endpoint for a DB instance in an Aurora DB cluster that connects to that specific DB instance.Each DB instance in a DB cluster, regardless of instance type, has its own unique instance endpoint

Among these different end-point we will be using the “Instance Endpoint” ie., individual end-point in maxscale config.

The problem here is application should have reads and writes split at application layer. So that it can use the Reader and writer end points efficiently. But if a user migrates to Aurora for scalability then we need a intelligent proxy like Maxscale / ProxySQL. Currently Maxscale has inbuilt support for Aurora.

How is Aurora Monitored by Maxscale?

Maxscale uses a special monitor module called ‘auroramon’, since the Aurora does not follow standard MySQL replication protocol for replicating data to its under replica.

How ‘auroramon’ identifies master and replica from ‘Instance Endpoint’ ?

Each node inside the aurora cluster( in our use case its 1master + 2 Replica), has a aurora_server_id (@@aurora_server_id), which is a unique identifier for each instance/node.

And also Aurora all the relevant information about replication including the aurora_server_id inside the table information_schema.replica_host_status, Below is the structure of the table.

+----------------------------------------+---------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------------------+---------------------+------+-----+---------------------+-------+
| SERVER_ID | varchar(100) | NO | | | |
| SESSION_ID | varchar(100) | NO | | | |
| IOPS | int(10) unsigned | NO | | 0 | |
| READ_IOS | bigint(10) unsigned | NO | | 0 | |
| PENDING_READ_IOS | int(10) unsigned | NO | | 0 | |
| CPU | double | NO | | 0 | |
| DURABLE_LSN | bigint(20) unsigned | NO | | 0 | |
| ACTIVE_LSN | bigint(20) unsigned | NO | | 0 | |
| LAST_TRANSPORT_ERROR | int(10) | NO | | 0 | |
| LAST_ERROR_TIMESTAMP | datetime | NO | | 0000-00-00 00:00:00 | |
| LAST_UPDATE_TIMESTAMP | datetime | NO | | 0000-00-00 00:00:00 | |
| MASTER_SLAVE_LATENCY_IN_MICROSECONDS | bigint(10) unsigned | NO | | 0 | |
| REPLICA_LAG_IN_MILLISECONDS | double | NO | | 0 | |
| LOG_STREAM_SPEED_IN_KiB_PER_SECOND | double | NO | | 0 | |
| LOG_BUFFER_SEQUENCE_NUMBER | bigint(10) unsigned | NO | | 0 | |
| IS_CURRENT | tinyint(1) unsigned | NO | | 0 | |
| OLDEST_READ_VIEW_TRX_ID | bigint(10) unsigned | NO | | 0 | |
| OLDEST_READ_VIEW_LSN | bigint(10) unsigned | NO | | 0 | |
| HIGHEST_LSN_RECEIVED | bigint(1) unsigned | NO | | 0 | |
| CURRENT_READ_POINT | bigint(1) unsigned | NO | | 0 | |
| CURRENT_REPLAY_LATENCY_IN_MICROSECONDS | bigint(1) unsigned | NO | | 0 | |
| AVERAGE_REPLAY_LATENCY_IN_MICROSECONDS | bigint(1) unsigned | NO | | 0 | |
| MAX_REPLAY_LATENCY_IN_MICROSECONDS | bigint(1) unsigned | NO | | 0 | |
+----------------------------------------+---------------------+------+-----+---------------------+-------+

The above structure of table is subjected to change as per Aurora version.

Another important variable to check is the ‘SESSION_ID’ which is a unique identifier value for replica nodes, but for the master server it will defined as ‘MASTER_SESSION_ID.

Based on these two variables the master and replica is segregated by maxscale monitor and which sets the status flag based on which the router sends the traffic to nodes.

Now let get into the configuration part, of maxscale for Aurora.

Installation and administration has been covered in previous blogs part1 and part2.

Below is the Aurora monitor module configuration.

[Aurora-Monitor]
type=monitor
module=auroramon
servers=nodeA,nodeB,nodeC
user=USERXXXX
passwd=9BE2F1F3B182F061CEA59799AA758D1DAE6B8ADF32845517C13EA0122A5BA7F5
monitor_interval=2500

Below is the server, definitions, i have named each node as nodeA, nodeB & nodeC and have provided instance end-point.

[nodeA]
type=server
address=prodtest-rr-tier1XX.xxxxxx.us-east-1.rds.amazonaws.com
port=3306
protocol=MySQLBackend
persistpoolmax=200
persistmaxtime=3600

[nodeB]
type=server
address=proddtest-rr-tier1YY.xxxxxxx.us-east-1.rds.amazonaws.com
port=3306
protocol=MySQLBackend
persistpoolmax=200
persistmaxtime=3600

[nodeC]
type=server
address=proddtest-rr-tier1ZZ.XXXXXXX.us-east-1.rds.amazonaws.com
port=3306
protocol=MySQLBackend
persistpoolmax=200
persistmaxtime=3600

In the above server i have enabled connection pool by defining persistpoolmax and persistmaxtime, this would greatly help in overcoming restart of instance due to memory handling with aurora.

Once all the config is done you can reload Maxscale config / restart Maxscale service.

-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status 
-------------------+-----------------+-------+-------------+--------------------
nodeA | proddtest-rr-tier1XX.xxxxxx.us-east-1.rds.amazonaws.com| 3306 | 0 | Slave, Running
nodeB | proddtest-rr-tier1YY.xxxxxxx.us-east-1.rds.amazonaws.com | 3306 | 0 | Slave, Running
nodeC | proddtest-rr-tier1ZZ.XXXXXXX.us-east-1.rds.amazonaws.com | 3306 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------

 

Maxscale_Aurora.jpg
Architecture

 

Below is the method to check the RW split from command line, by default a query inside ‘start transaction;’ goes to the master node, Since am connecting directly from max scale node i have used local socket, you can also use IP and Port instead.

[root@ip-XXXXXX mydbops]# mysql -u mydbops -pXXXXXX -S /tmp/ClusterMaster -e "show global variables like '%aurora_server_id%';"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------------+----------------------------+
| Variable_name    | Value                      |
+------------------+----------------------------+
| aurora_server_id | proddtest-rr-tier1ZZ       |
+------------------+----------------------------+

[root@ip-XXXXXXX mydbops]# mysql -u mydbops -pXXXXX -S /tmp/ClusterMaster -e "start transaction;show global variables like '%aurora_server_id%';commit;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------------+----------------------------+
| Variable_name    | Value                      |
+------------------+----------------------------+
| aurora_server_id | proddtest-rr-tier1XX       |
+------------------+----------------------------+

To monitor the percentage of RW split between master & replica and configuration stats as below.

MaxScale> show service "Splitter Service"
Service:                             Splitter Service
Router:                              readwritesplit
State:                               Started
use_sql_variables_in:      all
slave_selection_criteria:  LEAST_BEHIND_MASTER
master_failure_mode:       fail_instantly
max_slave_replication_lag: 30
retry_failed_reads:        true
strict_multi_stmt:         true
strict_sp_calls:           false
disable_sescmd_history:    true
max_sescmd_history:        0
master_accept_reads:       true

Number of router sessions:           5
Current no. of router sessions:      2
Number of queries forwarded:          20
Number of queries forwarded to master:8 (40.00%)
Number of queries forwarded to slave: 12 (60.00%)
Number of queries forwarded to all:   4 (20.00%)
Started:                             Mon Feb 19 15:53:48 2018
Root user access:                    Disabled
Backend databases:

[prodtest-rr-tier1XX.xxxxxx.us-east-1.rds.amazonaws.com]:3306    Protocol: MySQLBackend    Name: nodeA
[proddtest-rr-tier1YY.xxxxxxx.us-east-1.rds.amazonaws.com]:3306    Protocol: MySQLBackend    Name: nodeB
[proddtest-rr-tier1ZZ.XXXXXXX.us-east-1.rds.amazonaws.com]:3306    Protocol: MySQLBackend    Name: nodeC

Total connections:                   7
Currently connected:                 2

 

Now you can start using Maxscale and scale your queries for Aurora cluster. It is becomes no more mandatory for the segregating the reads and writes queries for Aurora.

 

RDS Aurora MySQL and Service Interruptions

$
0
0

In Amazon space, any EC2 or Service instance can “disappear” at any time.  Depending on which service is affected, the service will be automatically restarted.  In EC2 you can choose whether an interrupted instance will be restarted, or left shutdown.

For an Aurora instance, an interrupted instance is always restarted. Makes sense.

The restart timing, and other consequences during the process, are noted in our post on Aurora Failovers.

Aurora Testing Limitations

As mentioned earlier, we love testing “uncontrolled” failovers.  That is, we want to be able to pull any plug on any service, and see that the environment as a whole continues to do its job.  We can’t do that with Aurora, because we can’t control the essentials:

  • power button;
  • reset switch;
  • ability to kill processes on a server;
  • and the ability to change firewall settings.

In Aurora, an instance is either running, or will (again) be running shortly.  So that we know.  Aurora MySQL also offers some commands that simulate various failure scenarios, but since they are built-in we can presume that those scenarios are both very well tested, as well as covered by the automation around the environment.  Those clearly defined cases are exactly the situations we’re not interested in.

What if, for instance, a server accepts new connections but is otherwise unresponsive?  We’ve seen MySQL do this on occasion.  Does Aurora catch this?  We don’t know and  we have no way of testing that, or many other possible problem scenarios.  That irks.

The Need to Know

If an automated system is able to catch a situation, that’s great.  But if your environment can end up in a state such as described above and the automated systems don’t catch and handle it, you could be dead in the water for an undefined amount of time.  If you have scripts to catch cases such as these, but the automated systems catch them as well, you want to be sure that you don’t trigger “double failovers” or otherwise interfere with a failover-in-progress.  So either way, you need to know and and be aware whether a situation is caught and handled, and be able to test specific scenarios.

In summary: when you know the facts, then you can assess the risk in relation to your particular needs, and mitigate where and as desired.

A corporate guarantee of “everything is handled and it’ll be fine” (or as we say in Australia “She’ll be right, mate!“) is wholly unsatisfactory for this type of risk analysis and mitigation exercise.  Guarantees and promises, and even legal documents, don’t keep environments online.  Consequently, promises and legalities don’t keep a company alive.

So what does?  In this case, engineers.  But to be able to do their job, engineers need to know what parameters they’re working with, and have the ability to test any unknowns.  Unfortunately Aurora is, also in this respect, a black box.  You have to trust, and can’t comprehensively verify.  Sigh.

Viewing all 18783 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>