Quantcast
Channel: Planet MySQL
Viewing all 18823 articles
Browse latest View live

Performance Evaluation of SST Data Transfer: Without Encryption (Part 1)

$
0
0
SST Data Transfer

In this blog, we’ll look at evaluating the performance of an SST data transfer without encryption.

A State Snapshot Transfer (SST) operation is an important part of Percona XtraDB Cluster. It’s used to provision the joining node with all the necessary data. There are three methods of SST operation available: mysqldump, rsync, xtrabackup. The most advanced one – xtrabackup – is the default method for SST in Percona XtraDB Cluster.

We decided to evaluate the current state of xtrabackup, focusing on the process of transferring data between the donor and joiner nodes tp find out if there is any room for improvements or optimizations.

Taking into account that the security of the network connections used for Percona XtraDB Cluster deployment is one of the most important factors that affects SST performance, we will evaluate SST operations in two setups: without network encryption, and in a secure environment.

In this post, we will take a look at the setup without network encryption.

Setup:

  • database server: Percona XtraDB Cluster 5.7 on the donor node
  • database: sysbench database – 100 tables, 4M rows each (total ~122GB)
  • network: donor/joiner hosts are connected with dedicated 10Gbit LAN
  • hardware: donor/joiner hosts – boxes with 28 Cores+HT/RAM 256GB/Samsung SSD 850/Ubuntu 16.04

In our test, we will measure the amount of time it takes to stream all necessary data from the donor to the joiner with the help of one of SST’s methods.

Before testing, I measured read/write bandwidth limits of the attached SSD drives (with the help of sysbench/fileio): they are ~530-540MB/sec. That means that the best theoretical time to transfer all of our database files (122GB) is ~230sec.

Schematic view of SST methods:

  • Streaming DB files from the donor to joiner with tar
    (donor) tar | socat                         socat | tar (joiner)
    • tar is not really an SST method. It’s used here just to get some baseline numbers to understand how long it takes to transfer data without extra overhead.
  • Streaming DB files from the donor to joiner with rsync protocol
    (donor) rsync                               rsync(daemon mode) (joiner)
    • While working on the testing of the rsync SST method, I found that the current way of data streaming is quite inefficient: rsync parallelization is directory-based, not file-based. So if you have three directories, – for instance sbtest (100files/100GB), mysql (75files/10MB), performance_schema (88files/1M) – the rsync SST script will start three rsync processes, where each process will handle its own directory. As a result, instead of parallel transfer we end up with one stream that only streams the largest directory (sbtest). Replacing that approach with one that iterates over all files in datadir and queues them to rsync workers allows us to speed up the transfer of data 2-3 times.On the charts, ‘rsync’ is the current approach and ‘rsync_improved’ is the improved one.
  • Backup data on the donor side and stream it to the joiner in xbstream format
    (donor) xtrabackup | socat  socat | xbstream (joiner)

At the end of this post, you will find the command lines used for testing each SST method.

SST Data Transfer

Streaming of our database files with tar took a minimal amount of time, and it’s very close to the best possible time (~230sec). xtrabackup is slower (~2x), as is rsync (~3x).

From profiling xtrabackup, we can clearly see two things:

  1. IO utilization is quite low
  2. A notable amount of time was spent in crc32 computation

Issue 1
xtrabackup can process data in parallel, however by default it does it with a single thread only. Our tests showed that increasing the number of parallel threads to 2/4 with the

--parallel
 option allows us to improve IO utilization and reduce streaming time. One can pass this option to xtrabackup by adding the following to the [sst] section of my.cnf:
[sst]
inno-backup-opts="--parallel=4"

Issue 2
By default xtrabackup uses software-based crc32 functions from the libz library. Replacing this function with a hardware-optimized one allows a notable reduction in CPU usage and a speedup in data transfer. This fix will be included in the next release of xtrabackup.

SST Data Transfer

We ran more tests for xtrabackup with the parallel option and hardware optimized crc32, and got results that confirm our analysis. Streaming time for xtrabackup is now very close to baseline and storage limits.

Testing details

For the purposes of testing, I’ve created a script “sst-bench.sh” that covers all the methods used in this post. You can try to measure all the above SST methods in your environment. In order to run script, you have to adjust several environment variables in the beginning, such as

joiner ip
,
datadirs
 location on the joiner and donor hosts, etc. After that, put the script to the “donor” and “joiner” hosts and run it as the following:
#joiner_host> sst_bench.sh --mode=joiner --sst-mode=<tar|xbackup|rsync>
#donor_host>  sst_bench.sh --mode=donor --sst-mode=<tar|xbackup|rsync|rsync_improved>


How to setup MaxScale with MariaDB Galera Cluster

$
0
0

This post is just following up my previous blog post which describes how to setup 3-nodes MariaDB Galera Cluster with MySQL-Sandbox on single server.

 

Today, I’ll try to explain how we can setup MariaDB MaxScale over the Galera Cluster. Before I move ahead, I would like to explain about MaxScale little bit.

MariaDB MaxScale is a database proxy that enables horizontal database scaling while maintaining a fast response to client applications. You can implement MaxScale on either MySQL Replication or Galera cluster. With MySQL Replication, you can either use Read/Write Splitting or Connection routing and same with Galera Cluster. You can get more information here about this product.

https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale-14/setting-up-maxscale/

https://mariadb.com/products/mariadb-maxscale

So here, I’m going to setup MaxScale with Read/Write Splitting on MariaDB Galera Cluster. I’m using Ubuntu and 3-node Galera Cluster setup which is running on single server.

MariaDB [(none)]> show global status like 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)

Download and Install MaxScale  :  https://mariadb.com/downloads/maxscale

nilnandan@ubuntu:~/MariaDB$ wget https://downloads.mariadb.com/MaxScale/2.0.5/ubuntu/dists/xenial/main/binary-amd64/maxscale-2.0.5-1.ubuntu.xenial.x86_64.deb
... 
2017-03-27 11:01:29 (1.86 MB/s) - ‘maxscale-2.0.5-1.ubuntu.xenial.x86_64.deb’ saved [3739198/3739198]

nilnandan@ubuntu:~/MariaDB$ sudo dpkg -i maxscale-2.0.5-1.ubuntu.xenial.x86_64.deb 
Selecting previously unselected package maxscale.
(Reading database ... 216604 files and directories currently installed.)
Preparing to unpack maxscale-2.0.5-1.ubuntu.xenial.x86_64.deb ...
Unpacking maxscale (2.0.5) ...
Setting up maxscale (2.0.5) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for libc-bin (2.23-0ubuntu7) ...
nilnandan@ubuntu:~/MariaDB$
Configure Maxscale :  
To make MaxScale work, first we have to configure maxscale.cnf file. There are 5 sections in this,
1. Global Parameters
2. Service
3. Listener
4. MySQL Monitor
5. Maxadmin Configuration
Here, You can set all MySQL related global parameters with 1st section.
“A service represents the database service that MaxScale offers to the clients”
“A listener defines a port and protocol pair that is used to listen for connections to a service.”
“MySQL Monitor modules are used by MaxScale to internally monitor the state of the backend databases in order to set the server flags for each of those servers”
“Maxadmin is client utility which connects with MaxScale, run commands and check status of cluster”
You can get more information here about this sections. But it should be like this,
Client  <-> Listener <-> Service <-> Galera Nodes
For the service and MySQL Monitor, we have to create separate MySQL users with permissions. But here, I’m using one MySQL user maxscale with all permissions as this is testing server. If you want to check what permissions are needed then you can visit this page.   Setting up MaxScale
Here is my configuration file:
root@ubuntu:~# cat /etc/maxscale.cnf
# Global parameters
[maxscale]
threads=4

# Service definitions
[Read-Write Service]
type=service
router=readwritesplit
servers=dbnode1, dbnode2, dbnode3
user=maxscale
passwd=msandbox

# Listener definitions for the services
[Read-Write Listener]
type=listener
service=Read-Write Service
protocol=MySQLClient
port=4006
socket=/tmp/galeramaster.sock

# Server definitions
[dbnode1]
type=server
address=127.0.0.1
port=19222
socket=/tmp/mysql_sandbox19222.sock
protocol=MySQLBackend
priority=1

[dbnode2]
type=server
address=127.0.0.1
port=19223
socket=/tmp/mysql_sandbox19223.sock
protocol=MySQLBackend
priority=2

[dbnode3]
type=server
address=127.0.0.1
port=19224
socket=/tmp/mysql_sandbox19224.sock
protocol=MySQLBackend
priority=3

# Monitor for the servers
[Galera Monitor]
type=monitor
module=galeramon
servers=dbnode1, dbnode2, dbnode3
user=maxscale
passwd=msandbox
monitor_interval=10000
use_priority=true
# This service enables the use of the MaxAdmin interface
# MaxScale administration guide:
# https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Reference/MaxAdmin.md

[MaxAdmin Service]
type=service
router=cli

[MaxAdmin Listener]
type=listener
service=MaxAdmin Service
protocol=maxscaled
port=6603
socket=default
root@ubuntu:~#
For more details about this settings, you can visit this page : MaxScale Read/Write Splitting With Galera Cluster
Start MaxScale from root with command : service maxscale start
root@ubuntu:~# ps -ef | grep maxscale
maxscale 10532 1 0 14:57 ? 00:00:00 /usr/bin/maxscale --user=maxscale
root 10547 10405 0 14:57 pts/21 00:00:00 grep --color=auto maxscale
root@ubuntu:~#
If any error occurs, you can check here:  /var/log/maxscale 
When you’ll start maxscale with service command, by default it will use –user=maxscale. This is linux user which will not have any /home dir.Due to that, maxadmin command will not work and give error like
root@ubuntu:~# maxadmin
Unable to connect to MaxScale at /tmp/maxadmin.sock: Connection refused
root@ubuntu:~#
To make it simple, I would suggest to change maxscale.service file and update user to root rather than maxscale
root@ubuntu:~# cat /lib/systemd/system/maxscale.service
...
[Service]
...
ExecStartPre=/usr/bin/install -d /var/run/maxscale -o maxscale -g maxscale
#ExecStart=/usr/bin/maxscale --user=maxscale
ExecStart=/usr/bin/maxscale --user=root
...
root@ubuntu:~#

after that you’ll be able to login and run commands with maxadmin and check the health/status of cluster nodes.
root@ubuntu:~# maxadmin
MaxScale>
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
dbnode1 | 127.0.0.1 | 19222 | 0 | Master, Synced, Running
dbnode2 | 127.0.0.1 | 19223 | 0 | Slave, Synced, Running
dbnode3 | 127.0.0.1 | 19224 | 0 | Slave, Synced, Running
-------------------+-----------------+-------+-------------+--------------------
MaxScale>
MaxScale> list services
Services.
--------------------------+----------------------+--------+---------------
Service Name | Router Module | #Users | Total Sessions
--------------------------+----------------------+--------+---------------
Read-Write Service | readwritesplit | 2 | 2
MaxAdmin Service | cli | 3 | 4
--------------------------+----------------------+--------+---------------

MaxScale> list listeners
Listeners.
---------------------+--------------------+-----------------+-------+--------
Service Name | Protocol Module | Address | Port | State
---------------------+--------------------+-----------------+-------+--------
Read-Write Service | MySQLClient | * | 4006 | Running
Read-Write Service | MySQLClient | /tmp/galeramaster.sock | 0 | Running
MaxAdmin Service | maxscaled | * | 6603 | Running
MaxAdmin Service | maxscaled | default | 0 | Running
---------------------+--------------------+-----------------+-------+--------

In “list servers” output, you can see that MaxAdmin defines Master or Slave servers in status column. This thing you can manage by priority. You can see in configuration file that I’ve set “use_priority=true” in [Galera Monitor] sections and set “priority=1,2 or 3” in [dbnode] sections. Priority=1 will be master here.
How to connect to Galera through MaxScale:
root@ubuntu:# mysql -umaxscale -p --socket=/tmp/galeramaster.sock
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 12642
Server version: 10.0.0 2.0.5-maxscale MariaDB Server
..
MySQL [(none)]>
MySQL [(none)]> show global status like 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.01 sec)
So this is how you can setup MaxScale over Galera Cluster. There many more things which needs to be explained related to MaxScale like how to monitor the status, how we can put any node in maintenance mode etc but as this become already very long post, I’ll try to explain these things in my next blog post.

Experiments with MySQL 5.7’s Online Buffer Pool Resize

$
0
0

One of the interesting features introduced in MySQL 5.7 is that innodb_buffer_pool_size is a dynamic variable (since 5.7.5, to be more specific). Yet, past experience tells us that just because a variable is dynamic, it does not make it is safe to change it on the fly.

To find out how safe this new feature is, I measured throughput on a synthetic workload (sysbench 1.0 running the oltp script) as I made changes to this variable. In this post, I will show the results that came through.

 

The Environment

For my tests, I used a Google Cloud Compute instance of type n1-standard-4 (that is 4 vCPUs and 15 GB of memory) with 25 GB of persistent ssd. The dataset was about 9.2 GB (on disk, Innodb, no compression, 40M rows), with a smaller version of almost 1 GB (160k rows) for a specific test.

As mentioned earlier, the workload was sysbench 1.0’s oltp script.

 

The Experiments

The goal of the experiment was to measure what impact (if any) changing innodb_buffer_pool_size dynamically has on the workload, measured in terms of throughput (transactions per second).

After some tests to find a suitable run time, I decided to do the following for each test:

  • restore the data directory from a backup, so all runs had the same data and a cold buffer,
  • run sysbench for 240 seconds, reporting stats every second, and
  • change innodb_buffer_pool_size after 120 seconds.

Here’s how the variable was modified:

  • The ‘normal’ configuration is 4GB
  • For the ‘increased’ tests, it was modified to 8GB
  • For the ‘decreased’ tests, to 2GB

Innodb’s log file size was set to 1 GB, and no other changes were made to the default configuration. I was not going for benchmark results here; I simply wanted to find out how safe it would be to do this in production.

Let me start by showing what happens when I left sysbench to run its oltp script for 240 seconds, with no changes made to the variable:

 

The Results

Let me start by showing what happens when I just left sysbench to run its oltp script for 240 seconds, with no changes made to the variable:

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is unchanged. It shows some periodic drops that reduce over time as the cache warms up

We can see some periodic drops in tps that improve with time, and which go away if sysbench is left to run for about 600 seconds, but, again, I just wanted to get an idea of the safety of changing the Buffer Pool’s size on a live system.  In the end, this baseline was good enough to let me run several tests in a short amount of time.

Let’s see what happens now when, at second 120, the BP’s size is reduced from 4 to 2 GBs:

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is reduced from 4 to 2 GBs. It shows a noticeable drop in tps around second 120, which is when the change is made. After that, tps recovers, but is less than before, which is expected.

We see a very clear drop around the time of the change, and then an expected drop in tps. You may be wondering why I tested a change that I knew would result in poor performance, and that’s a valid question. In my experience, people make mistakes when tuning Innodb. I’ve witnessed this several times and know this to be an incredibly realistic scenario. I think it is interesting to know, besides the expected result of less tps, what happens when the change is actually made. Looking at sysbench’s output (you can find all the files, along with the scripts I used, here) we see that the drop started at second 121 and lasted until about 130, where tps started to stabilize again. I think that’s pretty good. Remember, we are talking about a variable that required a service restart in previous versions, and nothing is worse for throughput than mysqld not running at all. With that in mind, a few seconds of reduced performance seems like an improvement to me.

Here is what happens when, given the same start, the BP size is increased to 8GB at second 120:

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is increased from 4 to 8 GB. It shows a very short drop around second 120, which is when the change is made.

There is another drop, but it seems shorter, and honestly, I probably wouldn’t have noticed it in the graph if it wasn’t for that lonely dot near the bottom. Looking at the raw output, we can see the impact is seen only on second 121. I think this is very good news. Again, compared with what we had before, this means that, at least on this controlled experiment, we were able to increase the BP’s size with very little production impact.

Another increase example, in this case, from 2 to 8 GB, which I have labelled as ‘increase needed’ in my scripts because titles are a kind of name, and naming things is one of the hardest problems in computing:

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is increased from 2 to 8 GB. It shows a very short drop around second 120, which is when the change is made, and then an improvement in tps.

The drop is also measured just on second 121, and tps improves significantly starting on second 122, so this makes me even more optimistic about this feature.

Let’s now see what happens when we decrease the BP while running on the small dataset:

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is decreased from 4 to 2GB, but with a small data set. It shows a very short drop around second 120, which is when the change is made.

My goal here was to try and simulate what may happen when we dynamically reduced an oversized BP because, for example, someone copied the configuration from a standalone production MySQL to a shared hosted test instance with small datasets, (which is something I have also seen done). In this case, the drop is right at second 120, and then it goes back to normal.

Finally, what happens when the BP size is reduced, and then this change is rolled back after 20 seconds?

Scatter plot showing throughput in tps on the Y axis, and time in seconds on the X one. This shows tps for sysbench's oltp workload when the buffer pool's size is decreased from 4 to 2GB, and then increased to 4 again after 20 seconds. It shows a drop in throughput at second 120, which is when the change is made, some poor tps while the change is in place, and then a recovery to values prior to the change.

A Quick Look at the Drops in Throughput

We have seen that, in all cases, there is a drop in throughput when this variable is changed, with varying length depending on the circumstances. I took a quick look at what happens then via pt-pmp, and found out that, during the drops, most threads are waiting on the trx_commit_complete_for_mysql function. This function is found on file trx0trx.cc of Innodb and is described by a comment as flushing the log to disk (if required). So what happens if we change this on a read-only workload? It turns out there is still a short drop in throughput, and this time most threads are waiting at the buf_page_make_young function, which moves a page to the start of the BP’s LRU list. In both cases, the ‘root’ wait is for internal mutexes: one protecting writes to the log in the oltp workload’s case, and one protecting the buffer pool in the read only workload’s case.

Conclusion

I think this new feature was much needed.  It’s a welcome addition to MySQL’s capabilities, and while, yes, it can have some impact in your workload (even if it is read-only), it must be compared to what happened before –  a service restart was needed.

Performance Evaluation of SST Data Transfer: With Encryption (Part 2)

$
0
0
SST Data Transfer

In this blog post, we’ll look at the performance of SST data transfer using encryption.

In my previous post, we reviewed SST data transfer in an unsecured environment. Now let’s take a closer look at a setup with encrypted network connections between the donor and joiner nodes.

The base setup is the same as the previous time:

  • Database server: Percona XtraDB Cluster 5.7 on donor node
  • Database: sysbench database – 100 tables 4M rows each (total ~122GB)
  • Network: donor/joiner hosts are connected with dedicated 10Gbit LAN
  • Hardware: donor/joiner hosts – boxes with 28 Cores+HT/RAM 256GB/Samsung SSD 850/Ubuntu 16.04

The setup details for the encryption aspects in our testing:

  • Cryptography libraries: openssl-1.0.2, openssl-1.1.0, libgcrypt-1.6.5(for xbstream encryption)
  • CPU hardware acceleration for AES – AES-NI: enabled/disabled
  • Ciphers suites: aes(default), aes128, aes256, chacha20(openssl-1.1.0)

Several notes regarding the above aspects:

  • Cryptography libraries. Now almost every Linux distribution is based on the openssl-1.0.2. This is the previous stable version of the OpenSSL library. The latest stable version (1.1.0) has various performance/scalability fixes and also support of new ciphers that may notably improve throughput, However, it’s problematic to upgrade from 1.0.2 to 1.1.0, or just to find packages for openssl-1.1.0 for existing distributions. This is due to the fact that replacing OpenSSL triggers update/upgrade of a significant number of packages. So in order to use openssl-1.1.0, most likely you will need to build it from sources. The same applies to socat – it will require some effort to build socat with openssl-1.1.0.
  • AES-NI. The Advanced Encryption Standard Instruction Set (AES-NI) is an extension to the x86 CPU’s from Intel and AMD. The purpose of AES-NI is to improve the performance of encryption and decryption operations using the Advanced Encryption Standard (AES), like the AES128/AES256 ciphers. If your CPU supports AES-NI, there should be an option in BIOS that allows you to enabled/disable that feature. In Linux, you can check /proc/cpuinfo for the existence of an “aes” flag. If it’s present, then AES-NI is available and exposed to the OS.There is a way to check what acceleration ratio you can expect from it:
    # AES_NI disabled with OPENSSL_ia32cap
    OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm
    ...
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-gcm      57535.13k    65924.18k   164094.81k   175759.36k   178757.63k
    # AES_NI enabled
    openssl speed -elapsed -evp aes-128-gcm
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-gcm     254276.67k   620945.00k   826301.78k   906044.07k   923740.84k

    Our interest is the very last column: 178MB/s(wo AES-NI) vs 923MB/s(w AES-NI)
  • Ciphers. In our testing for network encryption with socat+openssl 1.0.2/1.1.0, we used the following ciphers suites:
    DEFAULT – if you don’t specify a cipher/cipher string for OpenSSL connection, this suite will be used
    AES128 – suite with aes128 ciphers only
    AES256 – suites with aes256 ciphers onlyAdditionally, for openssl-1.1.0, there is an extra cipher suite:
    CHACHA20 – cipher suites using ChaCha20 algoIn the case of xtrabackup, where internal encryption is based on libgcrypt, we use the AES128/AES256 ciphers from this library.
  • SST methods. Streaming database files from the the donor to joiner with the rsync protocol over an OpenSSL-encrypted connection:
    (donor) rsync | socat+ssl       socat+ssl| rsync(daemon mode) (joiner)

    The current approach of wsrep_sst_rsync.sh doesn’t allow you to use the rsync SST method with SSL. However, there is a project that tries to address the lack of SSL support for rsync method. The idea is to create a secure connection with socat and then use that connection as a tunnel to connect rsync between the joiner and donor hosts. In my testing, I used a similar approach.

    Also take a note that in the chart below, there are results for two variants of rsync: “rsync” (the current approach), and “rsync_improved” (the improved one). I’ve explained the difference between them in my previous post.

  • Backup data on the donor side and stream it to the joiner in xbstream format over an OpenSSL encrypted connection
    (donor) xtrabackup| socat+ssl  socat+ssl | xbstream (joiner)

    In my testing for streaming over encrypted connections, I used the

    --parallel=4
     option for xtrabackup. In my previous post, I showed that this is important factor to get the best time. There is also a way to pass the name of the cipher that will be used by socat for the OpenSSL connection in the wsrep_sst_xtrabackup-v2.sh script with the
    sockopt
     option. For instance:
    [sst]
    inno-backup-opts="--parallel=4"
    sockopt=",cipher=AES128"

  • Backup data on the donor side/encrypt it internally(with libgcrypt) and stream the data to the joiner in xbstream format, and afterwards decrypt files on the joiner
    (donor) xtrabackup | socat   socat | xbstream ; xtrabackup decrypt (joiner)

    The xtrabackup tool has a feature to encrypt data when performing a backup. That encryption is based on the libgcrypt library, and it’s possible to use AES128 or AES256 ciphers. For encryption, it’s necessary to generate a key and then provide it to xtrabackup to perform encryption on fly. There is a way to specify the number of threads that will encrypt data, along with the chunk size to tune process of encryption.

    The current version of xtrabackup supports an efficient way to read, compress and encrypt data in parallel, and then write/stream it. From the other side, when we accept a stream we can’t decompress/decrypt stream on the fly. At first, the stream should be received/written to disk with the xbstream tool and only after that can you use xtrabackup with

    --decrypt/--decompress
     modes to unpack data. The inability to process data on the fly and save the stream to disk for later processing has a notable impact on stream time from the donor to the joiner. We have a plan to fix that issue, so that encryption+compression+streaming of data with xtrabackup happens without the necessity to write stream to the disk on the receiver side.

    For my testing, in the case of xtrabackup with internal encryption, I didn’t use SSL encryption for socat.

Results (click on the image for an enlarged view):

SST Data Transfer

Observations:

  • Transferring data with rsync is very inefficient, and the improved version is 2-2.5 times faster. Also, you may note that in the case of “no-aes-n”, the rsync_improved method has the best time for default/aes128/aes256 ciphers. The reason is that we perform both data transfers in parallel (we spawn rsync process for each file), as well as encryption/decryption (socat forks extra processes for each stream). This approach allows us to compensate for the absence of hardware acceleration by using several CPU cores. In all other cases, we only use one CPU for streaming of data and encryption/decryption.
  • xtrabackup (with hardware optimized crc32) shows the best time in all cases, except for the default/aes128/aes256 ciphers in “no-aes-ni” mode (where rsync_imporved showed the best time). However I would like to remind you that SST with rsync is a blocking operation, and during the data transfer the donor node becomes READ-ONLY. xtrabackup, on the other hand, uses backup locks and allows any operations on donor node during SST.
  • On the boxes without hardware acceleration (no-aes-ni mode), the chacha20 cipher allows you to perform data transfer 2-3 times faster. It’s a very good replacement for “aes” ciphers on such boxes. However, the problem with that cipher is that it is available only in openssl-1.1.0. In order to use it, you will need a custom build of OpenSSL and socat for many distros.
  • Regarding xtrabackup with internal encryption (xtrabackup_enc): reading/encrypting and streaming data is quite fast, especially with the latest libgcrypt library(1.7.x). The problem is decryption. As I’ve explained above, right now we need to get the stream and save encrypted data to storage first, and then perform the extra step of reading/decrypting and saving the data back. That extra part consumes 2/3 of the total time. Improving the xbstream tool to perform steam decryption/decompression on the fly would allow you to get very good results.

Testing Details

For purposes of the testing, I’ve created a script “sst-bench.sh” that covers all the methods used in this post. You can use it to measure all the above SST methods in your environment. In order to run the script, you have to adjust several environment variables at the beginning of the script:

joiner ip
, datadirs location on joiner and donor hosts, etc. After that, put the script on the “donor” and “joiner” hosts and run it as the following:
#joiner_host>
sst_bench.sh --mode=joiner --sst-mode=<tar|xbackup|rsync> --cipher=<DEFAULT|AES128|AES256|CHACHA20> --ssl=<0|1> --aesni=<0|1>
#donor_host>
sst_bench.sh --mode=donor --sst-mode=<tar|xbackup|rsync|rsync_improved> --cipher=<DEFAULT|AES128|AES256|CHACHA20> --ssl=<0|1> --aesni=<0|1>

Webinar Replay and Q&A: Load balancing MySQL & MariaDB with ProxySQL & ClusterControl

$
0
0

Thanks to everyone who participated in our recent webinar on how to load balance MySQL and MariaDB with ClusterControl and ProxySQL!

This joint webinar with ProxySQL creator René Cannaò generated a lot of interest … and a lot of questions!

We covered topics such as ProxySQL concepts (with hostgroups, query rules, connection multiplexing and configuration management), went through a live demo of a ProxySQL setup in ClusterControl (try it free) and discussed upcoming ClusterControl features for ProxySQL.

These topics triggered a lot of related questions, to which you can find our answers below.

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Sign up for the webinar on HA in ProxySQL

Webinar Questions & Answers

Q. Thank you for your presentation. I have a question about connection multiplexing: does ProxySQL ensure that all statements from start transaction to commit are sent through the same backend connection?

A. This is configurable.

A small preface first: at any time, each client’s session can have one or more backend connections associated with it. A backend connection is associated to a client when a query needs to be executed, and normally it returns immediately back to the connection pool. “Normally” means that there are circumstances when this doesn’t happen. For example, when a transaction starts, the connection is not returned anymore to the connection pool until the transaction completes (either commits or rollbacks). This means that all the queries that should be routed to the same hostgroup where the transaction is running, are guaranteed to run in the same connection.

Nonetheless, by default, a transaction doesn’t disable query routing. That means that while a transaction is running on one connection to a specific hostgroup and this connection is associated with only that client, if the client sends a query destinated to another hostgroup, that query could be sent to a different connection.

Whatever the query could be sent to a different connection or not based on query rules is configurable by the value of mysql_users.transaction_persistent:

  • 0 = queries for different hostgroup can be routed to different connections while a transaction is running;
  • 1 = query routing will be disabled while the transaction is running.

The behaviour is configurable because it depends on the application. Some applications require that all the queries are part of the same transaction, other applications don’t.

Q. What is the best way to set up a ProxySQL cluster? The main concern here is configuration of the ProxySQL cascading throughout the cluster.

A. ProxySQL can be deployed in numerous ways.

One typical deployment pattern is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. If the number of application hosts increase, you can deploy a middle-layer of 3-5 ProxySQL instances and configure all ProxySQL instances from application servers to connect via this middle-layer. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How would you recommend to make the ProxySQL layer itself highly available?

There are numerous methods to achieve this.

One common method is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. In such a deployment there is no single point of failure as every application host connects to the ProxySQL installed locally.

When you implement a middle-layer, you will also maintain HA as 3-5 ProxySQL nodes would be enough to make sure that at least some of them are available for local proxies from application hosts.

Another common method of deploying a highly available ProxySQL setup is to use tools like keepalived along with virtual IP. The application will connect to VIP and this IP will be moved from one ProxySQL instance to another if keepalived detects that something happened to the “main” ProxySQL.

Q. How can ProxySQL use the right hostgroup for each query?

A. ProxySQL route queries to hostgroups is based on query rules - it is up to the user to build a set of rules which make sense in their environment.

Q. Can you tell us more about query mirroring?

A. In general, the implementation of query mirroring in ProxySQL allows you to send traffic to two hostgroups.

Traffic sent to the “main” hostgroup is ensured to reach it (unless there are no hosts in that hostgroup); on the other hand, mirror hostgroup will receive traffic on a “best effort” basis - it should but it is not guaranteed that the query will indeed reach the mirrored hostgroup.

This limits the usefulness of mirroring as a method to replicate data. It is still an amazing way to do load testing of new hardware or redesigned schema. Of course, mirroring reduces the maximal throughput of the proxy - queries have to be executed twice so the load is also twice as high. The load is not split between the two, but duplicated.

Q. And what about query caching?

Query cache in ProxySQL is implemented as a simple key->value memory store with Time To Live for every entry. What will be cached and for how long - this is decided on the query rules level. The user can define a query rule matching a particular query or a wider spectrum of them. To identify query results set in cache, ProxySQL uses query hash along with information about user and schema.

How to set TTL for a query? The simplest answer is: to the maximum value of replication lag which is acceptable for this query. If you are ok to read stale data from slave, which is lagging 10 seconds, you should be fine reading stale data from cache when TTL is set to 10000 milliseconds.

Q. Connection limit to backends?

A. ProxySQL indeed implements a connection limit to backend servers. The maximum number of connections to any backend instance is defined in mysql_servers table.

Because the same backend server can be present in multiple hostgroups, it is possible to define the maximum number of connections per server per hostgroup.

This is useful for example in the case of a small set of connections where specific long running queries are queued without affecting the rest of the traffic destinated to the same server.

Q. Regarding the connection limit from the APP: are connections QUEUED?

A. If you reach the mysql-max_connections, further connections will be rejected with the error “Too many connections”.

It is important to remember that there is not a one-to-one mapping between application connections and backend connections.

That means that:

  • Access to the backends can be queued, but connections from the application are either accepted or rejected.
  • A large number of application connections can use a small number of backend connections.

Q. I haven’t heard of SHUN before: what does it mean?

A. SHUN means that the backend is temporarily marked as non-available but ProxySQL will attempt to connect to it after mysql-shun_recovery_time_sec seconds

Q. Is query sharding available across slaves?

A. Depending on the meaning of sharding, ProxySQL can be used to perform sharding across slaves. For example, it is possible to send all traffic for a specific set of tables to a set of slaves (in a hostgroup). Splitting the slaves into multiple hostgroups and performing query sharding accordingly is possible to improve performance, as each slave won’t read from disk data from tables for which it doesn’t process any query.

Q. How do you sync the configuration of ProxySQL when you have many instances for H.A ?

A. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How flexible or feasible it is to change the ProxySQL config online, eg. if one database slave is down, how is that handled in such a scenario ?

A. ProxySQL configuration can be changed at any time; it’s been designed with such level of flexibility in mind.

‘Database down’ can be handled differently, it depends on how ProxySQL is configured. If you happen to rely on replication hostgroups to define writer and reader hostgroups (this is how ClusterControl deploys ProxySQL), ProxySQL will monitor state of read_only variable on both reader and writer hostgroups and it will move hosts as needed.

If master is promoted by external tools (like ClusterControl, for example), read_only values will change and ProxySQL will detect a topology change and it will act accordingly. For a standard “slave down” scenario there is no required action from the management system standpoint - without any changes in read_only value ProxySQL will just detect that the host is not available and it will stop sending queries to it, re-executing on other members of the hostgroup those queries which didn’t complete on dead slave.

If we are talking about a setup not using replication hostgroups then it is up to the user and their scripts/tools to implement some sort of logic and reconfigure ProxySQL on runtime using admin interface. Slave down, though, most likely wouldn’t require any changes.

Q. Is it somehow possible to SELECT data from one host group into another host group?

A. No, at this point it is not possible to execute cross-hostgroup queries.

Q. What would be RAM/Disk requirements for logs , etc?

A. It basically depends on the amount of log entries and how ProxySQL log is verbose in your environment. Typically it’s neglectable.

Q. Instead of installing ProxySQL on all application servers, could you put a ProxySQL cluster behind a standard load balancer?

A. We see no reason why not? You can put whatever you like in front of the ProxySQL - F5, another layer of software proxies - it is up to you. Please keep in mind, though, that every layer of proxies or load balancers adds latency to your network and, as a result, to your queries.

Q. Can you please comment on Reverse Proxy, whether it can be used in SQL or not?

A. ProxySQL is a Reverse Proxy. Contrary to a Forward Proxy (that acts as an intermediary that simply forwards requests), a Reverse Proxy processes clients’ requests and retrieves data from servers. ProxySQL is a Reverse Proxy: clients send requests to ProxySQL, that will understand the request, analyze it, and decide what to do: rewrite, cache, block, re-execute on failure, etc.

Q. Does the user authentication layer work with non-local database accounts, e.g. with the pam modules available for proxying LDAP users to local users?

A. There is no direct support for LDAP integration but, as configuration management in ProxySQL is a child’s play, it is really simple to put together a script which will pull the user details from LDAP and load them into ProxySQL. You can use cron to sync it often. All ProxySQL needs is a username and password hash in MySQL format - this is enough to add a user to ProxySQL.

Q. It seems like the prescribed production deployment includes many proxies - are there any suggestions or upcoming work to address how to make configuration changes across all proxies in a consistent manner?

A. At this point it is recommended to leverage configuration management tools like Chef/Ansible/Puppet to manage ProxySQL’s configuration.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Sign up for the webinar on HA in ProxySQL

MySQL DevOps First Step: Revision Control

$
0
0

MySQL environments are notorious for being understaffed – MySQL is everywhere, and an organization is lucky if they have one full-time DBA, as opposed to a developer or sysadmin/SRE responsible for it.

That being said, MySQL is a complex program and it’s useful to have a record of configuration changes made. Not just for compliance and auditing, but sometimes – even if you’re the only person who works on the system – you want to know “when was that variable changed?” In the past, I’ve relied on the timestamp on the file when I was the lone DBA, but that is a terrible idea.

I am going to talk about configuration changes in this post, mostly because change control for configuration (usually /etc/my.cnf) is sorely lacking in many organizations. Having a record of data changes falls under backups and binary logging, and having a record of schema changes is something many organizations integrate with their ORM, so they are out of scope for this blog post.

Back to configuration – it is also helpful for disaster recovery purposes to have a record of what the configuration was. You can restore your backup, but unless you set your configuration properly, there will be problems (for example, an incompatible innodb_log_file_size will cause MySQL not to start).

So, how do you do this? Especially if you have no time?

While configuration management systems like chef, puppet and cfengine are awesome, they take setup time. If you have them, they are gold – use them! If you do not have them, you can still do a little bit at a time and improve incrementally.

If you really are at the basics, get your configurations into a repository system. Whether you use rcs, cvs, subversion or git (or anything else), make a repository and check in your configuration. The configuration management systems give you bells and whistles like being able to make templates and deploying to machines.

It is up to you what your deployment process is – to start, something like “check in the change, then copy the file to production” might be good enough, for a start – remember,  we’re taking small steps here. It’s not a great system, but it’s certainly better than not having any revision control at all!

A great system will use some kind of automated deployment, as well as monitoring to make sure that your running configuration is the same as your configuration file (using <A HREF=”https://www.percona.com/doc/percona-toolkit/3.0/pt-config-diff.html”>pt-config-diff). That way, there are no surprises if MySQL restarts.

But having a great system is a blog post for another time.

Bonanza Cuts Load in Half with VividCortex

$
0
0

Working with our users at Bonanza earlier this week, we saw their team demonstrate a great example of how monitoring insights can lead to a relatively simple — but impactful —  MySQL system tweak. In this case, the adjustment Bonanza made resulted in huge improvements to their total query time.

By looking at the mysql.innodb.queued_queries metric in VividCortex, it became clear to Bonanza's team there was an issue within InnoDB that was preventing otherwise runnable threads from executing. Often, when queries begin to queue, it's indicative of a problem; it's a good idea to regularly look for states like queuing, pending, or waiting as signs of potential issues. In this case, the innodb_thread_concurrency parameter had been configured to 8. Once VividCortex revealed the mysql.innodb.queued_queries metric, the parameter was changed to 0 (self governing).

The fix was implemented at about 5:35 pm on 3/28/2017. In the VividCortex chart below, you can see where queries cease queuing (because they've started executing faster and more efficiently). Note how the orange line drops off almost immediately.

Bonanza InnoDB Operations-1.jpg

This second chart shows how in the hour after the fix, SELECT total time dropped by over 50% compared to the hour previous. That is 9.59 fewer hours that the system spent executing queries — or 9.59 hours of extra CPU time available. Average latency went from 1.36 ms all the way down to 664.6 μs.

Bonanza Time Compare.jpg

This is a view of the query itself, from 4:35 pm to 6:35 pm. Note the overall decrease at 5:35 pm.

Bonanza Query Time Compare.jpg

Great work by the Bonanza team, and thank you for sharing!

Read the Free eBook
Estimating CPU Per Query with Weighted Linear Regression

Basics of MySQL Administration and best practices

$
0
0

Following are the few best practices and basic commands for MySQL Administration.

MySQL Access and credential security

shell> mysql -u testuser -pMyP@ss0rd
mysql: [Warning] Using a password on the command line interface can be insecure.

By looking at OS cmd’s history using history cmd other os users can see/get MySQL user password easily. It always good to not use a password on the command line interface. Another option for securing password while automating MySQL scripts is a use of mysql_config_editor. For more info on this check out my blog post about credential security.

Consider of having following implementation for Strong access policy.

  • use of  validate_password plugin for a strong password policy.
  • Limit the user access by specifying IP or IP range in a hostname.
  • Do not grant accessive privileges to user/s.
  • Have separate users for different operations like backup user with required backup privileges only.
  • Avoid giving FILE and super privileges to remote users.
  • For public network communication between client and server use SSL connection method.

Replication

  •  IF EXISTS and IF NOT EXISTS use while creating DB objects.

Most common problem for replication break or errors is that OBJECT already exists on SLAVE. By using IF EXISTS and IF NOT EXISTS while creating database objects we can avoid.

  • Use of GTID and crash-safe replication.
  • Keep your slave in read-only mode.
  • Run your backups and query optimization on SLAVE. This will avoid unnecessary load on MASTER.

Logging

Logs are great significance for admin. Following types of logs, you can enable for MySQL servers.

  • Binary log: Extra copy of your database transactions.
  • Relay log:  By default enable and get created when you setup replication.
  • General log: To log MySQL client tool commands.
  • Slow query log: Log slow queries taking more time for execution.
  • Error / MySQL server log: Record NOTES, WARNINGS and ERROR for MySQL server.
  • Audit Log:  Log user info and activities like executes queries by a user/s along with source IP, timestamp, target database etc.
  • To maintain these logs like purging OLD logs using logrotate. Check MYSQL SERVER LOG MAINTENANCE for more info.

 MySQL STARTUP- SHUTDOWN

Always check MySQL error log for STARTUP- SHUTDOWN and make sure for clean STARTUP/ SHUTDOWN.

Basic commands for MySQL Administration

MySQL database and table creation

CREATE IF NOT EXISTS DATABASE:

CREATE IF NOT EXISTS  DATABASE test_db ;
Use test_db ;

CREATE TABLE:

CREATE IF NOT EXISTS TABLE t1 (id int(11) primary key auto_incremnet ,uname varchar(50), comments text);

INSERT INTO TABLE:

INSERT INTO t1 (id, uname, comments) VALUES(101,’lalit’,’mysql DBA’);

MySQL Database Users

CREATE USER:

  1. The CREATE USER statement creates new MySQL accounts.
CREATE USER IF NOT EXISTS 'local_user1'@'localhost' IDENTIFIED BY 'mypass'; (Remote connection restricted for this user)

If you specify only a username part of the account name, a host name part of ‘%’ is used.

CREATE USER IF NOT EXISTS 'remote_user1'@'%' IDENTIFIED BY 'mypass';

(Remote connection enabled for this user)

  1. User details are getting stored under user table.
SELECT user,host FROM mysql.user;

 

RENAME USER:

RENAME USER 'abc'@'localhost' TO 'xyz'@'%';

DROP USER:

 DROP IF EXISTS USER 'remote_user1'@'%’;

User password management:

  1. Change/Update user password.
ALETR USER IF EXISTS 'remote_user1'@'%' IDENTIFIED BY 'mypass';
  1. Password expire user account
ALTER USER IF EXISTS 'remote_user1'@'%' PASSWORD EXPIRE;
  1. Locked User account
ALTER USER IF EXISTS 'remote_user1'@'%' ACCOUNT LOCK;

 

MySQL Database Users Access Restrictions using privileges.

Grant privileges to a user:

Privileges can be granted on database/s, table/s and related objects to it.

Example.

Case1:  Grant all privileges on ‘db1’ database to user ‘remote_user1’@’%’

GRANT ALL PRIVILEGES ON db1.* TO 'remote_user1'@'%';

Case2: Grant selected privileges on ‘db1’ database to user ‘remote_user1’@’%’

GRANT SELECT, INSERT, UPDATE, DELETE ON db1.* TO 'remote_user1'@'%';

Case3. Grant SELECT privilege single table access to user ‘remote_user1’@’%’

GRANT SELECT ON db1.table1 TO 'remote_user1'@'%';

Ref: http://dev.mysql.com/doc/refman/5.7/en/grant.html

Revoking privileges from user:

Example:

REVOKE SELECT, INSERT, UPDATE, DELETE ON db1.* FROM 'remote_user1'@'%';

Ref: http://dev.mysql.com/doc/refman/5.7/en/revoke.html

Check User Privileges using SHOW GRANTS command:

Example:

SHOW GRANTS FOR 'mysqldba'@'localhost';

SHOW GRANTS; (It will display the privileges granted to the current account)

SHOW GRANTS FOR 'remote_user1'@'%';

Ref: http://dev.mysql.com/doc/refman/5.7/en/show-grants.html

MySQL monitoring

Check database size:

Information_schema (Metadata)

SELECT table_schema "Data Base Name",    sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB",    sum( data_free )/ 1024 / 1024 "Free Space in MB"FROM information_schema.TABLESGROUP BY table_schema ;

Check Active users:

show processlist ;

InnoDB Engine Status:

SHOW STATUS;

SHOW ENGINE INNODB STATUS;
  1. Performance schema: Live statistics

Example:

– Enable Locking related instruments (if it’s not enabled):

UPDATE performance_schema.setup_instruments SET ENABLED=’YES’, TIMED=’YES’ WHERE NAME=’wait/lock/metadata/sql/mdl’;

SELECT * FROM performance_schema.metadata_locks WHERE OBJECT_SCHEMA=’test’ AND OBJECT_NAME LIKE ‘t_’;

  1. MySQL Enterprise monitor
  2. Customized scripts

Check Database objects info:

Databases:

SHOW DATABASES;

Select Database:

Use db_name;

Tables in Database:

SHOW TABLES;

SELECT TABLE_NAME from information_schema.TABLES where TABLE_SCHEMA = 'test_db';

ROUTINES:

select * from ROUTINES where ROUTINE_SCHEMA='db_name’;

INDEX:

select TABLE_NAME,INDEX_NAME,COLUMN_NAME,INDEX_TYPE  from information_schema.STATISTICS where TABLE_SCHEMA = 'db_name';

View:

select * from information_schema.VIEWS where TABLE_SCHEMA = 'db_name';

Mysqldump  Backup-Restore:

Require privileges: mysqldump requires at least the SELECT privilege for dumped tables, SHOW VIEW for dumped views, TRIGGER for dumped triggers, and LOCK TABLES if the –single-transaction option is not used. Certain options might require other privileges as noted in the option descriptions.

Backup:

Full Database backup:

mysqldump -u root  -p --single-transaction --databases db1  --routines > db1_fullbkp.sql

 OR

mysqldump -u root  -p --single-transaction  --databases db1 --routines | gzip >  db1_fullbkp.sql.gz

 

Single table backup:

mysqldump -u  -h  -p --single-transaction db_name table_name --routines > db1_full.sql

Ref : http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html

Restore:

To reload a dump file, you must have the privileges required to execute the statements that it contains, such as the appropriate CREATE privileges for objects created by those statements.

mysql -u username -p db_name < db1_fullbkp.sql

OR

gunzip < db1_fullbkp.sql.gz | mysql -u username -p db_name

MySQL Replication:

  1. Create replication user on MASTER with replication privileges.
CREATE USER [IF NOT EXISTS] 'rpluser'@'%' IDENTIFIED BY 'rpluser1234';GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'rpluser'@'%';
  1. On SLAVE: setup replication as follows:
CHANGE MASTER TO MASTER_HOST='<MASTER_IP',MASTER_USER='rpluser',MASTER_PASSWORD='rpluser1234',MASTER_PORT=3306,MASTER_AUTO_POSITION=1;
  1. Start slave
START SLAVE;
  1. Check slave status
SHOW SLAVE STATUS;

Slave_IO_Running and Slave_SQL_Running column value should be ‘YES’

MYSQL service [Linux]

MySQL SHUTDOWN steps:

shell> sudo service mysqld stop

MySQL STARTUP steps:

shell> sudo service mysqld start

 

All Set !!



Log Buffer #508: A Carnival of the Vanities for DBAs

$
0
0

This Log Buffer Edition covers Oracle, SQL Server and MySQL.

Oracle:

Compiling views: When the FORCE Fails You

Goldengate 12c Troubleshooting XAGENABLE

A performance Deep Dive into Tablespace Encryption

EBS Release 12 Certified with Safari 10 and MacOS Sierra 10.12

Oracle Database 12c (12.2.0.1.0) on VirtualBox

SQL Server:

A Single-Parameter Date Range in SQL Server Reporting Services

Generating Plots Automatically From PowerShell and SQL Server Using Gnuplot

Justify the Remote Access Requirement for Biztalk Log Shipping

Building Better Entity Framework Applications

Performing a Right and Comprehensive Age Calculation

MySQL:

Basics of MySQL Administration and Best Practices

Bonanza Cuts Load in Half with VividCortex

Experiments with MySQL 5.7’s Online Buffer Pool Resize

MySQL 8.0 Collations: The Devil is in the Details.

How to Encrypt MySQL Backups on S3

Making the life prettier with gdb PrettyPrinting API

$
0
0

Anyone who peeked inside a gdb manual knows, that gdb has some kind of Python API. And anyone who skimmed through, have seen something called “Pretty Printing” that supposedly tells gdb how to print complex data structures in a nice and readable way. Well, at least I have seen that, but I’ve never given it […]

The post Making the life prettier with gdb PrettyPrinting API appeared first on MariaDB.org.

A practical explanation: problems during unicode collation conversion

$
0
0

Introduction

Recently I have been involved in an effort to convert MySQL databases from a utf8 character set to utf8mb4. As a part of this effort, my team evaluated which collations would be best for facilitating a broad range of multi-lingual support.

There have been many recent posts in the MySQL community about better unicode collation support in MySQL 8 such as from the MySQL Server Team’s blog at Oracle, who have also done a good job of showing us how newer collations based on UTF8 9.0.0 will properly group and sort characters according to their case and inheritance. As the title of the latter post suggests, the “devil is” indeed “in the details”.

There is also the matter of the “sushi-beer” problem, which shows that utf8mb4_unicode_520_ci will treat beer and sushi emoji as equal.

Rather than focusing on the particular deficiencies and enhancements in collations, this post focuses on practical solutions to converting existing data sets.  Every DBA, at some point, faces the daunting task of character set conversion or differing collation usage.  However, before we go on, a note.

Green field projects

When you are considering a new MySQL database to support a new project, common wisdom derived from articles linked above and from the many articles one may find on Stack Overflow and other resources suggests that if you have multi-lingual and emoji storage requirements: Just use utf8mb4 character set and utf8mb4_unicode_520_ci collation in MySQL.  That is, until MySQL 8.0 goes GA.  If you really know what you are doing, or if you already know a lot about this subject, your choices may vary.  The salient point here is that using an appropriate unicode collation (rather than defaults) will save the DBA from several future headaches, particularly regarding unique keys.

Existing data sets

Most DBAs work in shops with existing data sets with growing requirements.  Many DBAs will have already worked on converting latin1 databases to utf8.  Most MySQL installations that use utf8 will have utf8_general_ci as the collation.  For utf8mb4, the current default is utf8mb4_general_ci.

As illustrated in many documents and talks you’ll find that the general_ci collations in MySQL are sub-optimal when it comes to sorting and enforcing uniqueness.  They are not so bad with latin characters.  “APPLE = ApPlE = apple”, but when you’re talking about multi-lingual support, the general collations are generally bad at case insensitivity and allow variations such as the “apple” example in other languages to be distinguished as different.

To help illustrate a practical approach, I will provide an illustration of potential issues that one may encounter, and an example of how to potentially find and fix all unique key issues in a data set.

Creating an Example Data Set

To facilitate this example, I created a sample table in a MySQL 5.7 database:

CREATE TABLE `utf8_test` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `singlechar` varchar(4) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unk1` (`singlechar`)
) ENGINE=InnoDB AUTO_INCREMENT=189331 DEFAULT CHARSET=utf8mb4

I chose varchar(4) because a lot of people use varchars, and I chose length 4, because some “emoji” and other 4 byte strings generated on the console give a MySQL error 1300 “Invalid utf8 character string: ‘\the\Code'”.  If you have a column with 4 bytes, then MySQL lets you insert it anyway, with a warning, rather than an error.

Then, I populated this with fake data, spanning all sorts of characters using a fairly “ugly” but very “quick” solution.

jscott@jscott-57-1:~$ cat /dev/urandom > myfile
jscott@jscott-57-1:~$ for x in `cat myfile | tr -d "[:cntrl:]" | grep -a -o .`; do mysql -v js -e "set collation_connection=utf8mb4_general_ci; insert into utf8_test (singlechar) values ('$x');"; done

Concatenating /dev/urandom to a file for a few seconds yielded about 35MB of “garbage”, containing many conceivable utf8 characters.  We loop through the file output with bash.  On Linux, tr does a bad job of actually breaking up the characters because it does not yet support multi-byte characters, but it does an excellent job of removing “control characters“, hence “-d [:cntrl:]”. grep is doing our heavy lifting for breaking up the strings (-a option to treat binary as text and the -o with a period as the argument to break up the string character by character).  Finally, we feed those characters into single insert statements to our table.  As you might imagine, this is quite nasty in terms of error messages, but it gets the job done.  In a few minutes time, we have a table with an integer id, and a couple thousand “supposedly unique” rows.

Unique key violations

For most DBAs, finding problems with collations starts with your first unique key error.  Let’s try converting our newly created table to use the utf8mb4_unicode_520_ci collation (default is utf8mb4_general_ci):

mysql> alter table utf8_test convert to character set utf8mb4 collate utf8mb4_unicode_520_ci;
ERROR 1062 (23000): Duplicate entry '9' for key 'unk1'

This tells us we have a violation.  But what then?  Inevitably, you solve one, then you find another, then another, then another.

Using concat() to find problems

You may knock MySQL for having inadequate collation support, but it has outstanding support for querying data in various ways.  For one thing, “collate” can be supplied to various parts of a query.  I like to use concat for these cases for reasons which will become clear later, but here’s a query that works well to find all the collisions in our sample table:

select count(1) as cnt, group_concat(id), concat(singlechar collate utf8mb4_unicode_ci) as unk from utf8_test group by unk having cnt > 1;

Notice that inside the concat() we are adding collate utf8mb4_unicode_520_ci. This seeks to emulate what MySQL is doing when trying to alter table (supplying a new table collation), but giving us all the information ahead of time.

Sample output:

+-----+-------------------------------+------+
| cnt | group_concat(id)              | unk  |
+-----+-------------------------------+------+
|   2 | 642,8804                      | ΄    |
|   2 | 1242,20448                    | ΅    |
|   2 | 194,11764                     | ;    |
|   2 | 16145,29152                   | ·    |
|   2 | 114105,33                     | ︵   |
|   2 | 63,186608                     | }    |
|   2 | 4963,44554                    | ʹ    |
|   2 | 84,87616                      | >    |
|   4 | 120845,292,2759,38412         | ୦    |
|   4 | 2,21162,25295,47504           | 1    |
|   5 | 46179,81143,231,7766,36158    | ²    |
|   4 | 66,2339,19777,26796           | 3    |
|   5 | 102802,158554,150,16224,21282 | ፬    |
|   3 | 35,14643,19433                | 5    |
|   3 | 107,377,9234                  | 6    |
|   4 | 12,585,12643,28853            | ٧    |
|   3 | 60,12070,25619                | 8    |
|   3 | 17,70,27677                   | ٩    |
|   3 | 32,4370,12498                 | A    |

Looking at one of these by itself:

mysql> select * from utf8_test where id in (102802,158554,150,16224,21282);
+--------+------------+
| id     | singlechar |
+--------+------------+
|    150 | 4          |
|  16224 | ٤          |
|  21282 | ۴          |
| 102802 | ፬          |
| 158554 | ៤          |
+--------+------------+
5 rows in set (0.00 sec)

All of the above characters are resolving to the one with ID 102802.

Possible actions:

In my current working environment, we decided to continue to use utf8mb4_general_ci collation, because we were unable to determine (in all cases) whether the duplicates in our data set were simply “character-case” related or whether we would actually experience data loss.

In the future, we expect to be able to “trust” the new collations in mysql 8 to be correct.

We went the extra mile to create a utility based on the following query.  The utility finds possible unique key violations using queries like the one I used above.

information_schema is your friend:
select
	tc.TABLE_NAME,
	tc.CONSTRAINT_NAME,
	group_concat(kc.COLUMN_NAME),
	case when group_concat(c.DATA_TYPE) like '%char%' then 1 else 0 end as contains_char
FROM
	TABLE_CONSTRAINTS tc
INNER JOIN
	KEY_COLUMN_USAGE kc
		on tc.CONSTRAINT_SCHEMA = kc.CONSTRAINT_SCHEMA
		and tc.CONSTRAINT_NAME = kc.CONSTRAINT_NAME
		and tc.TABLE_NAME = kc.TABLE_NAME
INNER JOIN
	information_schema.COLUMNS c
		on kc.CONSTRAINT_SCHEMA = c.TABLE_SCHEMA
		and kc.TABLE_NAME = c.TABLE_NAME
		and kc.COLUMN_NAME = c.COLUMN_NAME
WHERE
	tc.CONSTRAINT_SCHEMA = 'your_schema'
	and tc.CONSTRAINT_TYPE = 'UNIQUE'
GROUP BY
	tc.TABLE_NAME,tc.CONSTRAINT_NAME
HAVING
	contains_char=1;

Using the query above and iterating through its results  you can build similar SQL statements to the one we used in our test scenario above to discover duplicates, supplying the “collate” in “concat” functions to the char columns.  concat() is a great fit, because it allows you to do the same query (

select count(1) as cnt, group_concat(id), concat(col1,'-',col2 collate <test collation>,'-'......) as unk from utf8_test group by unk having cnt > 1;

) for a unique constraint having varying numbers of columns.  An additional query to information schema in the loop is required to find which column has the char type.

You can then use the group concatenated IDs in the results to choose “winners and losers”, find dependent rows and update them, delete the losing rows, etc.


New MariaDB Dashboard in Percona Monitoring and Management Metrics Monitor

$
0
0
MariaDB

In honor of the upcoming MariaDB M17 conference in New York City on April 11-12, we have enhanced Percona Monitoring and Management (PMM) Metrics Monitor with a new MariaDB Dashboard and multiple new graphs!

The Percona Monitoring and Management MariaDB Dashboard builds on the efforts of the MariaDB development team to instrument the Aria Storage Engine Status Variables related to Aria Pagecache and Aria Transaction Log activity, the tracking of Index Condition Pushdown (ICP), InnoDB Online DDL when using ALTER TABLE ... ALGORITHM=INPLACE, InnoDB Deadlocks Detected, and finally InnoDB Defragmentation. This new dashboard is available in Percona Monitoring and Management release 1.1.2. Download it now using our docker, VirtualBox or Amazon AMI installation options!

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL®, MariaDB® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL, MariaDB® and MongoDB servers to ensure that your data works as efficiently as possible.

Aria Pagecache Reads/Writes

MariaDB 5.1 introduced the Aria Storage Engine, which is MariaDB’s MyISAM replacement. Originally known as the Maria storage engine, they renamed it in late 2010 in order to avoid confusion with the overall MariaDB project name. The Aria Pagecache Status Variables graph plots the count of disk block reads and writes, which occur when the data isn’t already in the Aria Pagecache. We also plot the reads and writes from the Aria Page Cache, which count the reads/writes that did not incur a disk lookup (as the data was previously fetched and available from the Aria pagecache):

MariaDB - Aria Pagecache Reads/Writes

Aria Pagecache Blocks

Aria reads and writes to the pagecache in order to cache data in RAM and avoid or delay activity related to disk. Overall, this translates into faster database query response times:

  • Aria_pagecache_blocks_not_flushed: The number of dirty blocks in the Aria pagecache.
  • Aria_pagecache_blocks_unused: Free blocks in the Aria pagecache.
  • Aria_pagecache_blocks_used: Blocks used in the Aria pagecache.

Aria Pagecache Total Blocks is calculated using Aria System Variables and the following formula:aria_pagecache_buffer_size / aria_block_size:MariaDB - Aria Pagecache Blocks

Aria Transaction Log Syncs

As Aria strives to be a fully ACID- and MVCC-compliant storage engine, an important factor is support for transactions. A transaction is the unit of work in a database that defines how to implement the four properties of Atomicity, Consistency, Isolation, and Durability (ACID). This graph tracks the rate at which Aria fsyncs the Aria Transaction Log to disk. You can think of this as the “write penalty” for running a transactional storage engine:

MariaDB - Aria Transaction Log Syncs

InnoDB Online DDL

MySQL 5.6 released the concept of an in-place DDL operation via ALTER TABLE ... ALGORITHM=INPLACE, which in some cases avoided performing a table copy and thus didn’t block INSERT/UPDATE/DELETE. MariaDB implemented three measures to track ongoing InnoDB Online DDL operations, which we plot via the following three status variables:

  • Innodb_onlineddl_pct_progress: Shows the progress of the in-place alter table. It might not be accurate, as in-place alter is highly dependent on the disk and buffer pool status
  • Innodb_onlineddl_rowlog_pct_used: Shows row log buffer usage in 5-digit integers (10000 means 100.00%)
  • Innodb_onlineddl_rowlog_rows: Number of rows stored in the row log buffer

MariaDB - InnoDB Online DLL

For more information, please see the MariaDB blog post Monitoring progress and temporal memory usage of Online DDL in InnoDB.

InnoDB Defragmentation

MariaDB merged the Facebook/Kakao defragmentation patch for defragmenting InnoDB tablespaces into their 10.1 release. Your MariaDB instance needs to have started with innodb_defragment=1 and your tables need to be in innodb_file_per_table=1 for this to work. We plot the following three status variables:

  • Innodb_defragment_compression_failures: Number of defragment re-compression failures
  • Innodb_defragment_failures: Number of defragment failures
  • Innodb_defragment_count: Number of defragment operations

MariaDB - InnoDB Defragmentation

Index Condition Pushdown

Oracle introduced this in MySQL 5.6. From the manual:

Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.

Essentially, the closer that ICP Attempts are to ICP Matches, the better!

MariaDB - Index Condition Pushdown (ICP)

InnoDB Deadlocks Detected (MariaDB 10.1 Only)

Ever since MySQL implemented a transactional storage engine there have been deadlocks. Deadlocks are conditions where different transactions are unable to proceed because each holds a lock that the other needs. In MariaDB 10.1, there is a Status variable that counts the occurrences of deadlocks since the server startup. Previously, you had to instrument your application to get an accurate count of deadlocks, because otherwise you could miss occurrences if your polling interval wasn’t configured frequent enough (even using pt-deadlock-logger). Unfortunately, this Status variable doesn’t appear to be present in the MariaDB 10.2.4 build I tested:

MariaDB - InnoDB Deadlocks Detected

Again, please download Percona Monitoring and Management 1.1.2 to take advantage of the new MariaDB Dashboard and new graphs!  For installation instructions, see the Deployment Guide.

You can see the MariaDB Dashboard and new graphs in action at the PMM Demo site. If you feel the graphs need any tweaking or if I’ve missed anything, leave a note on the blog. You can also write me directly (I look forward to your comments): michael.coburn@percona.com.

To start: on the ICP graph, should we have a line that defines the percentage of successful ICP matches vs. attempts?

Busy April 2017: MariaDB Dev Meeting (no-slave-left-behind, MyRocks, ...) and Percona Live

$
0
0
In a few days, I will start my yearly travel to North America which will bring me at Percona Live at the end of the month.  But I will first stop in New York to attend the MariaDB Developer Meeting.  Let's see what will happen there. Percona Live Booking.com is sponsoring the conference, and we will be present at the Monday Evening Reception.  You do not need a tutorial pass to attend: any

lefred.be is part of the TOP 10 MySQL Blogs

Percona Live Featured Session: Using SelectStar to Monitor and Tune Your Databases

$
0
0

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet the folks at SelectStar, a database monitoring and management tool company. SelectStar will be a sponsor at Percona Live this year.

I recently came across the SelectStar database monitoring product. There are a number of monitoring products on the market (with the evolution of various SaaS and on-premises solutions), but SelectStar piqued my interest for a few reasons. I had a chance to speak with Cameron Jones, Principal Product Manager at SelectStar about their tool:

Percona: What are the challenges that lead to developing SelectStar?

Cameron: One of the challenges that we’ve found in the database monitoring and management sector comes from the dilution of the database market – and not in a bad way. Traditional, closed source database solutions continue to be used across the board (especially by large enterprises), but open source options like MySQL, MongoDB, PostgreSQL and Elasticsearch continue to gain traction as organizations seek solutions that meet their demand for agility and flexibility.

From a database monitoring perspective, this adds some challenges. Traditional solutions are focused on monitoring RDBMS and are really great at it, while newer solutions may only focus on one piece of the puzzle (NoSQL or cloud only, for example).

Percona: How does SelectStar compare to other monitoring and management tools?

Cameron: SelectStar covers a wide array of open and closed source database solutions and is easy to setup. This makes it ideal for enterprises that have a lot going on. Here is the matrix of supported products from our website:

Database Types Key Metrics Monitored by SelectStar
Big Data
  • Hadoop
  • Cassandra
  • Ops Counters – Inserts, Queries, etc.
  • Network Traffic
  • Asserts
  • Locks
  • Memory Usage
Cloud
  • Amazon Aurora
  • Amazon Dynamo
  • Amazon RDS
  • Microsoft Azure
  • Queries
  • Memory Usage
  • Network
  • CPU Balance
  • IOPS
NoSQL
  • MongoDB
  • Ops Counters – Inserts, Queries, etc.
  • Network Traffic
  • Asserts
  • Locks
  • Memory Usage
Open Source
  • PostgreSQL
  • MongoDB
  • MySQL
  • MariaDB
  • Average Query Execution Time
  • Query Executions
  • Memory Usage
  • Wait Time
Traditional RDBMS
  • IBM DB2
  • MS SQL Server
  • Oracle
  • Average Query Execution Time
  • Query Executions
  • Memory Usage
  • Wait Time


In addition to monitoring key metrics for different database types, one of the key differences with SelectStar came from its comprehensive alerts and recommendations system.

The alerts and recommendations are designed to ensure you have an immediate understanding of key issues – and where they are coming from. MonYOG is great at this for MySQL, but lacks on other aspects. With SelectStar, you can pinpoint the exact database instance that may be causing the issue; or go further up the chain and see if it’s an issue impacting several database instances at the host level.

Recommendations are often tied to alerts – if you have a red alert, there’s going to be a recommendation tied to it on how you can improve. However, the recommendations pop up even if your database is completely healthy – ensuring that you have visibility into how you can improve your configuration before you actually have an issue impacting performance.

With insight into key metrics, alerts and recommendations, you can fine tune your database performance. In addition, it gives you the opportunity to become more proactive with your database monitoring.

Percona: Is configuring SelectStar difficult?

Cameron: SelectStar is easy to set up – in fact, most customers are up and running in 20 minutes.

Simply head over to the website – selectstar.io – and log in. From there, you’ll be greeted by a welcome screen where you can easily click through and configure a database.

Percona Live Featured Session SelectStar

To configure a database, you select your type:

Percona Live Featured Session SelectStar

And from there, set up your collector by inputting some key information.

Percona Live Featured Session SelectStar

And that’s it! As soon as it’s configured, the collector will start gathering information and data is populated within 20 minutes.

Percona: How does SelectStar work?

Cameron: Using agentless collectors, SelectStar gathers data from both your on-premises and AWS platforms so that you can have insight into all of your database instances.

Percona Live Featured Session SelectStar

The collector is basically an independent machine within your infrastructure that pulls data from your database. It is low impact so that it doesn’t impact performance. This is a different approach from all of the other monitoring tools.

Percona Live Featured Session

Router Metrics (Shown Above)

Percona Live Featured Session

Mongo relationship tree displaying router, databases, replica set, shards and nodes. (Shown Above)

Percona: Any final thoughts? What are you looking forward to at Percona Live?

Cameron: If you’re in the market for a new database monitoring solution, SelectStar is worth looking at because it covers a breadth of databases with the depth into key metrics, alerts and notifications that optimize performance across your databases. We have a free trial, so you have an easy option to try it. We’re looking forward to meeting with as much of the community as possible, getting feedback and hearing about people’s monitoring needs.

Register for Percona Live Data Performance Conference 2017, and meet the creators of SelectStar. You can find them at selectstar.ioUse the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.


Network attacks on MySQL, Part 5: Attack on SHA256 based passwords

$
0
0

The mysql_sha256_password doesn't use the nonce system which is used for mysql_new_password, but instead forces the use of RSA or SSL.

This is how that works:

  1. The client connects
  2. The server changes authentication to sha256 password (or default?)
  3. The server sends the RSA public key.
  4. The client encrypts the password with the RSA public key and sends it to the server.
  5. The server decrypts the password with the private key and validates it.

The problem is that the client trusts public key of the server. It is possible to use --server-public-key-path=file_name. But then you need to take care of secure public key distribution yourself.

So if we put a proxy between the client and the server and then have the proxy sent its own public key... then we can decrypt it and reencode it with the real public key and send it to the server. Also the decrypted password is the password, not a hash. So we then know the real password.

And if SSL is used it doesn't do the RSA encryption... but this can be a connection with an invalid certificate. Just anything as long as the connection is SSL.

Caching your application data with MySQL and TokuDB @ Percona Live 2017

$
0
0
The great Percona Live 2017 database conference is approaching, and I am very excited to be there this year - not only as an attendee, but also as a speaker!!
By the way, this will be my first public talk so please bear with me if I'll be a bit nervous :-)

If you are attending, come see me and my colleague Andrea speaking about how we leverage the TokuDB engine to cache our application data using MySQL! We will show you some interesting benchmarks, and will describe our design, setup, and configuration in detail.

See you there!!

MySQL High Availability tools - Comparing MHA, MRM and ClusterControl

$
0
0

We previously compared two high availability solutions for MySQL - MHA and MariaDB Replication Manager and looked into how they performed fail-over. In this blog post, we’ll see how ClusterControl stacks up against these solutions. Since MariaDB Replication Manager is under active development, we decided to take a look at the not yet released version 1.1.

Flapping

All solutions provide flapping detection. MHA, by default, executes a failover once. Even after you restart masterha_manager, it will still check if the last failover didn’t happen too recently. If yes (by default, if it happened in the last 8 hours), no new failover will happen. You need to explicitly change the timeout or set --ignore_last_failover flag.

MariaDB Replication Manager has less strict defaults - it will allow up to three failovers as long as each of them will happen more than 10 seconds after the previous one. In our opinion this is a bit too flexible - if the first failover didn’t solve the problem, it is unlikely that another attempt will give better results. Still, default settings are there to be changed so you can configure MRM however you like.

ClusterControl uses similar approach to MHA - only one failover is attempted. Next one can happen only after the master has been detected successfully as online (for example, ClusterControl recovery or manual intervention by the admin managed to promote one of the slaves to a master) or after restart of cmon process.

Lost transactions

MHA can work in two modes - GTID or non-GTID. Those modes differ regarding to how missing transactions are handled. Traditional replication, actually, is handled in a better way - as long as the old master is reachable, MHA connects to it and attempts to recover missing transactions from its binary logs. If you use GTID mode, this does not happen which may lead to more significant data loss if your slaves didn’t manage to receive all relay logs - another very good reason to use semi-synchronous replication, which has you covered in this scenario.

MRM does not connect to the old master to get the logs. By default, it elects the most advanced slave and promotes it to master. Remaining slaves are slaved off this new master, making them as up to date as the new master. There is a potential for a data loss, on par with MHA’s GTID mode.

ClusterControl behaves similarly to MRM - it picks the most advanced slave as a master candidate and then, as long as it is safe (for example, there are no errant transactions), promote it to become a new master. Remaining slaves get slaved off this new master. If ClusterControl detects errant transactions, it will stop the failover and alert the administrator that manual intervention is needed. It is also possible to configure ClusterControl to skip errant transaction check and force the failover.

Network partitioning

For MHA, this has been taken care of by adding a second MHA Manager node, preferably in another section of your network. You can query it using secondary_check_script. It can be used to connect to another MHA node and execute masterha_check_repl to see how the cluster can be seen from that node. This gives MHA a better view on the situation and topology, it might not failover as it is unnecessary.

MRM implements another approach. It can be configured to use slaves, external MaxScale proxy or scripts executed through HTTP protocol on a custom port (like the scripts which governs HAProxy behavior) to build a full view of the topology and then make an informed decision based on this.

ClusterControl, at this moment, does not perform any advanced checks regarding availability of the master - it uses only its own view of the system, therefore it can take an action if there are network issues between the master and the ClusterControl host. Having said that, we are aware this can be a serious limitation and there is a work in progress to improve how ClusterControl detects failed master - using slaves and proxies like MaxScale or ProxySQL to get a broader picture of the topology.

Roles

Within MHA you are able to apply roles to a specific host, so for instance ‘candidate_master’ and ‘no_master’ will help you determine which hosts are preferred to become master. A good example could be the data center topology: spread the candidate master nodes over multiple racks to ensure HA. Or perhaps you have a delayed slave that may never become the new master even if it is the last node remaining.

This last scenario is likely to happen with MariaDB Replication Manager as it can’t see the other nodes anymore and thus can’t determine that this node is actually, for instance, 24 hours behind. MariaDB does not support the Delayed Slave command but it is possible to use pt-slave-delay instead. There is a way to set the maximum slave delay allowed for MRM, however MRM reads the Seconds_Behind_Master from the slave status output. Since MRM is executed after the master is dead, this value will obviously be null.

At the beginning of the failover procedure, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replicaton_failover_whitelist

and

replicaton_failover_blacklist

The whitelist contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered. The second variable may contain a list of hosts which will never be considered as master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

Replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set

Integration

MHA is a standalone tool, it doesn’t integrate well with other external software. It does however provide hooks (pre/post failover scripts) which can be used to do some integration - for instance, execute scripts to make changes in the configuration of an external tool. MHA also uses read_only value to differentiate between master and slaves - this can also be used by external tools to drive topology changes. One example would be ProxySQL - MHA can work with this proxy using both pre/post failover scripts and with read_only values, depending on the ProxySQL configuration. It’s worth mentioning that, in GTID mode, MHA doesn’t support MariaDB GTID - it only supports Oracle MySQL or Percona Server.

MRM integrates nicely with MaxScale - it can be used along MaxScale in a couple of ways. It could be set so MaxScale will do the work to monitor the health of the nodes and execute MRM as needed, to perform failovers. Another option is that MRM drives MaxScale - monitoring is done on MRM’s side and MaxScale’s configuration is updated as needed. MRM also sets read_only variables so it makes it compatible with other tools which understand those settings (like ProxySQL, for example). A direct integration with HAProxy is also available - MRM, if collocated, may modify the HAProxy configuration whenever the topology changes. On the cons side, MRM works only with MariaDB installations - it is not possible to use it with Oracle MySQL’s version of GTID.

ClusterControl uses read_only variables to differentiate between master and slave nodes. This is enough to integrate with every kind of proxy which could be deployed from ClusterControl: ProxySQL, MaxScale and HAProxy. Failover executed by ClusterControl will be detected and handled by any of those proxies. ClusterControl also integrates with external tools regarding management. It provides access to management console for MaxScale and, to some extend, to HAProxy. Advanced support for ProxySQL will be added shortly. Metrics are provided for HAProxy and ProxySQL. ClusterControl supports both Oracle GTID and MariaDB GTID.

Conclusion

If you are interested in details how MHA, MRM or ClusterControl handle failover, we’d like to encourage you to take a look at the blog posts listed below:

Below is a summary of the differences between the different HA solutions:

  MHA MRM ClusterControl
Replication support non-GTID, Oracle GTID MariaDB GTID Oracle GTID and MariaDB GTID
Flapping One failover allowed Defaults are less restrictive but can be modified One failover allowed unless it brings the master online
Lost transactions Very good handling for non-GTID, no checking for transactions on master for GTID setups No checking for transactions on master No checking for transactions on master
Network Partitioning No support built in, can be added through user-created scripts Very good false positive detection using slaves, proxy or external scripts No support at this moment, work in progress to build false positive detection using proxy and slaves
Roles Support for whitelist and blacklist of hosts to promote to master No support Support for whitelist and blacklist of hosts to promote to master
Integration Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern. Close integration with MaxScale, integration with HAProxy is also available. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern. Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.

If we are talking about handling master failure, each of the solutions does its job well and feature-wise they are mostly on par. There are some differences in almost every aspect that we compared but, at the end, each of them should handle most of the master failures pretty well. ClusterControl lacks more advanced network partitioning detection but this will change soon. What could be important to keep in mind that those tools support different replication methods and this alone can limit your options. If you use non-GTID replication, MHA is the only option for you. If you use GTID, MHA and MRM are restricted to, respectively, Oracle MySQL and MariaDB GTID setups. Only ClusterControl (you can test it for free) is flexible enough to handle both types of GTID under one tool - this could be very useful if you have a mixed environment while you still would like to use one single tool to ensure high availability of your replication setup.

Dockerizing Wordpress with Nginx and PHP-FPM on Ubuntu 16.04

$
0
0
In this tutorial, I will guide you step-by-step to use docker-compose. We will deploy 'Wordpress' with Nginx, MySQL, and PHP-FPM. Each service has its own container, and we will use images from the docker hub registry. I will show you how to create containers from docker images and manage all containers with docker-compose.

My First Steps with MariaDB 10.2 and RocksDB Storage Engine

$
0
0
Last year I started to explore MyRocks, that is, RocksDB used as a storage engine with MySQL. So far I had to use Facebook's MySQL 5.6 to do this. I could try to use some specific development branches of MariaDB (or maybe even Percona Server) for this, but I preferred to wait until the engine is included into a main branch by the company I work for. Recently this happened, and now you can get RocksDB loaded and working in main MariaDB 10.2 branch. In this blog post I am going to explain how to build it from source and do some basic checks.

I was updating my MariaDB local repository on my Ubuntu 14.04 netbook, with 10.2 branch already checked out (do git checkout 10.2 if you see other marked with *, 10.1 is used by default):
openxs@ao756:~/git/server$ git branch  10.0
  10.1
* 10.2
  bb-10.2-marko
and noted the following at the end of output produced by git pull:
...
 create mode 100644 storage/rocksdb/unittest/test_properties_collector.cc
 create mode 100644 storage/rocksdb/ut0counter.h
So, I realized what I considered just talks, plans and rumors really happened - we have RocksDB in the main branch of MariaDB! I immediately proceeded with the following commands to get submodule for the engine up to date:
openxs@ao756:~/git/server$ git submodule init
Submodule 'storage/rocksdb/rocksdb' (https://github.com/facebook/rocksdb.git) registered for path 'storage/rocksdb/rocksdb'
openxs@ao756:~/git/server$ git submodule update
Submodule path 'libmariadb': checked out 'd1387356292fb840c7736aeb8f449310c3139087'
Cloning into 'storage/rocksdb/rocksdb'...
remote: Counting objects: 49559, done.
remote: Compressing objects: 100% (70/70), done.
remote: Total 49559 (delta 31), reused 1 (delta 1), pack-reused 49485
Receiving objects: 100% (49559/49559), 97.77 MiB | 3.45 MiB/s, done.
Resolving deltas: 100% (36642/36642), done.
Checking connectivity... done.
Submodule path 'storage/rocksdb/rocksdb': checked out 'ba4c77bd6b16ea493c555561ed2e59bdc4c15fc0'
openxs@ao756:~/git/server$ git log -1
commit 0d34dd7cfb700b91f11c59d189d70142ed652615
...
Then I applied my usual cmake command line and build commands:
openxs@ao756:~/dbs/maria10.2$ fc -l
2001     cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DWITH_JEMALLOC=system -DWITH_INNODB_DISALLOW_WRITES=ON -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/maria10.2
...
2003     time make -j 2
2004     make install && make clean
2005     cd
2006     cd dbs/maria10.2
2007     bin/mysqld_safe --no-defaults --port=3307 --socket=/tmp/mariadb.sock --rocksdb &
The last command above was my lame attempt to add RocksDB support in the same way it is done in MySQL 5.6 from Facebook. The option is not recognized and instead you just have to start as usual and install plugin:
install soname 'ha_rocksdb.so';
Then you'll see a lot of new rows in the output of SHOW PLUGINS:
...
| ROCKSDB                       | ACTIVE   | STORAGE ENGINE     | ha_rocksdb.so | GPL     |
| ROCKSDB_CFSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DBSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT          | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT_GLOBAL   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_CF_OPTIONS            | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_COMPACTION_STATS      | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_GLOBAL_INFO           | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DDL                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_INDEX_FILE_MAP        | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_LOCKS                 | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_TRX                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
+-------------------------------+----------+--------------------+---------------+---------+
64 rows in set (0.00 sec)

MariaDB [test]> show engines;
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                           | YES          | YES  | YES        |
| CSV                | YES     | CSV storage engine                                                               | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                                            | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                        | NO           | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                            | NO           | NO   | NO         |
| CONNECT            | YES     | Management of External Data (SQL/MED), including many file formats               | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                   | YES          | NO   | YES        |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                               | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (0.00 sec)
Now I can create ROCKSDB tables and work with them:
MariaDB [test]> create table tmariarocks(id int primary key, c1 int) engine=rocksdb;
Query OK, 0 rows affected (0.14 sec)

MariaDB [test]> insert into tmariarocks values(1,1);
Query OK, 1 row affected (0.04 sec)

MariaDB [test]> select version(), t.* from tmariarocks t;
+----------------+----+------+
| version()      | id | c1   |
+----------------+----+------+
| 10.2.5-MariaDB |  1 |    1 |
+----------------+----+------+
1 row in set (0.00 sec)

MariaDB [test]> show create table tmariarocks\G
*************************** 1. row ***************************
       Table: tmariarocks
Create Table: CREATE TABLE `tmariarocks` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ROCKSDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [test]> select engine, count(*) from information_schema.tables group by engine;
+--------------------+----------+
| engine             | count(*) |
+--------------------+----------+
| Aria               |       11 |
| CSV                |        2 |
| InnoDB             |        8 |
| MEMORY             |       74 |
| MyISAM             |       25 |
| PERFORMANCE_SCHEMA |       52 |
| ROCKSDB            |        1 |
+--------------------+----------+
7 rows in set (0.03 sec)
Moreover, I can still work with InnoDB tables and even mix them with ROCKSDB ones in the same transaction (at least to some extent):
MariaDB [test]> create table t1i(id int primary key) engine=InnoDB;
Query OK, 0 rows affected (0.24 sec)

MariaDB [test]> create table t1r(id int primary key) engine=ROCKSDB;
Query OK, 0 rows affected (0.13 sec)

MariaDB [test]> insert into t1i values (1), (2), (3);
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t1r select * from t1i;
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> insert into t1i values(5);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> insert into t1r values(6);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> commit;
Query OK, 0 rows affected (0.14 sec)

MariaDB [test]> select * from t1i;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  5 |
+----+
4 rows in set (0.00 sec)

MariaDB [test]> select * from t1r;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  6 |
+----+
4 rows in set (0.00 sec)
In the error log I see:
openxs@ao756:~/dbs/maria10.2$ tail data/ao756.err
2017-04-05 13:59:51 140157875193600 [Note]   cf=default
2017-04-05 13:59:51 140157875193600 [Note]     write_buffer_size=67108864
2017-04-05 13:59:51 140157875193600 [Note]     target_file_size_base=67108864
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: creating a column family __system__
2017-04-05 13:59:52 140157875193600 [Note]     write_buffer_size=67108864
2017-04-05 13:59:52 140157875193600 [Note]     target_file_size_base=67108864
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
2017-04-05 13:59:52 140157875193600 [Note] RocksDB instance opened
2017-04-05 14:03:13 140157875193600 [ERROR] Invalid (old?) table or database name '.rocksdb'
So, not 100% clean integration, it's probably still at alpha stage, but now it is easy to build, migrate, mix, test, benchmark and compare InnoDB from MySQL 5.7 with latest RocksDB, all on the same MariaDB 10.2 code base!

I think a really great job was done by Sergei Petrunia, my colleagues from MariaDB, Facebook engineers and other interested MyRocks community members. I'd like to thank them all for their hard work on MyRocks and making it available in MariaDB. Maria Rocks!
Viewing all 18823 articles
Browse latest View live