Become a DBA blog series - Monitoring and Trending

June 3, 2015, 9:46 pm

≫ Next: High availability using MySQL in the cloud

≪ Previous: Percona XtraDB Cluster 5.6.24-25.11 is now available

So, you’ve been working with MySQL for a while and now are being asked to manage it. Perhaps your primary job description is not about support and maintenance of the company’s databases (and data!), but now you’re expected to properly maintain one or more MySQL instances. It is not uncommon that developers, or network/system administrators, or DevOps folks with general backgrounds, find themselves in this role at some point in their career.

So, what does a DBA do? We know that a DBA manages the company’s databases, what does that mean? In this series of posts, we’ll walk you through the daily database operations that a DBA does (or at least ought to!).

We plan on covering the following topics, but do let us know if we’ve missed something:

Monitoring tools
Trending
Periodical healthchecks
Backup handling
High Availability
Common operations (online schema change, rolling upgrades, query review, database migration, performance tuning)
Troubleshooting
Recovery and repair
anything else?

In today’s post, we’ll cover monitoring and trending.

Monitoring and Trending

To manage your databases, you would need good visibility into what is going on. Remember that if a database is not available or not performing, you will be the one under pressure so you want to know what is going on. If there is no monitoring and trending system available, this should be the highest priority. Why? Let’s start by defining ‘trending’ and ‘monitoring’.

A monitoring system is a tool that keeps an eye on the database servers and alerts you if something is not right, e.g., a database is offline or the number of connections crossed some defined threshold. In such case, the monitoring system will send a notification in some defined way. Such systems are crucial because, obviously, you want to be the first to be informed if something’s not right with the database.

On the other hand, a trending system will be your window to the database internals. It will provide you with graphs that show you how those cogwheels are working in the system - the number of queries per second, how many read/write operations the database does on different levels, are table locks immediate or do queries have to wait for them, how often a temporary table is created, how often it is created on disk, and so on. If you are familiar with MySQL internals, you’ll be better equipped to analyze the graphs and derive useful information. Else, you may need some time to understand these graphs. Some metrics are pretty self-explanatory, others perhaps not so obvious. But in general, it’s probably better to have more data than not to have any when it’s needed.

Data is presented as graphs for better visibility - from graphs, the human mind can easily derive trends and locate anomalies. The trending system also gives you an idea of how things change over time - you need this visibility in both real time and for historical data, as things happen also when people sleep. If you have been on-call in an ops team, it is not unusual for an issue to have disappeared by the time you get paged at 3am, wake up, and log into the system.

Monitoring - best practices

There are many many monitoring solutions out there, chances are you probably have one of the following options already in your infrastructure:

Nagios:

Zabbix:

MONyog:

ClusterControl:

All of those tools have their pros and cons. Some are only for monitoring, others also provide you with trending. A good monitoring system should allow you to customize the thresholds of alerts, their severity, etc., and fine-tune it to your own needs. You should also be able to integrate with external paging services like PagerDuty.

How you’d like your monitoring setup to look like is also up to individual preferences. What we’d suggest is to focus on the most important aspects of your operations. As a general rule of thumb, you’d be interested to know if your system is up or not, if you can connect to the database, whether you can execute meaningful read and write queries (ideally something as close to the real workload as possible, for example you could read from a couple of production tables). Next in the order of importance would be to check if there’s an immediate threat to the system’s stability - high CPU/memory/disk utilization, lack of disk space. You want to have your alerts as actionable as possible - being waken up in the middle of the night, only to find that you can’t do anything about the alert, can be frustrating in the long run.

Trending - best practices

Next step would be to install some trending software. Again, similar to the monitoring tools, there is a plethora of choices. Best known are Cacti:

Munin:

and Zabbix:

ClusterControl, in addition to the cluster management, can also be used as a trending system.

There are also SaaS-based tools including Percona Cloud Tools and VividCortex.

Having a trending solution is not enough - you still have to know what kind of graphs you need. MySQL-focused monitoring tools will work great as they are focused on MySQL - they were created to bring to the MySQL DBA as much information as possible. Other tools that are of more generic nature will probably have to be configured. It would be outside the scope of this blog to go over such configurations, but we’d suggest to look at Percona Monitoring Plugins. They are prepared for Cacti and Zabbix (when it comes to trending) and you can easily set them up if you have chosen one of those tools. If not, you can still use them as a guide as to what MySQL metrics you want to have graphed and how to do that.

Once you have both monitoring and trending tools ready, you can go to the next phase - gathering the rest of the tools you will need in your day-to-day operations

CLI tools

In this part, we’d like to cover some useful CLI tools that you may want to install on your MySQL server. First of all, you’ll want to install Percona Toolkit. It is a set of tools designed to help DBAs in their work. Percona Toolkit covers tasks like checking data consistency across slaves, fixing data inconsistency, performing slow query audits, checking duplicate keys, keeping track of configuration changes, killing queries, checking grants, gathering data during incidents and many others. We will be covering some of those tools in the coming blogs, as we discuss different situations a DBA may end up into.

Another useful tool is sysbench. This is a system benchmarking tool with an OLTP test mode. That test stresses MySQL and allows you to get some understanding of the system’s capacity. You can install it by running apt-get/yum but you probably want to make sure that you have version 0.5 available - it included support for multiple tables and the results are more realistic. If you’d like to perform more detailed tests and closer to your “real world” workload, then take a look at Percona Playback - this tool can use “real world” queries in form of a slow query log or tcpdump output and then replay those queries on the test MySQL instance. While it might sound strange, performing such benchmarks to tune a MySQL configuration is not uncommon, especially at the beginning when a DBA is learning the environment. Please keep in mind that you do not want to perform any kind of benchmarking (especially with Percona Playback) on the production database - you’ll need a separate instance setup for that.

Jay Janssen’s myq_gadgets is another tool you may find useful. It is designed to provide information about the status of the database - statistics about com_* counters, handlers, temporary tables, InnoDB buffer pool, transactional logs, row locking, replication status. If you are running Galera cluster, you may benefit from ‘myq_status wsrep’ which gives you nice insight into writeset replication status including flow control.

At some point you’ll need to perform a logical dump of your data - it can happen earlier, if you already make logical backups, or later - when you’ll be upgrading your MySQL to a new major version. For larger datasets mysqldump is not enough - you may want to look into a pair of tools: mydumper and myloader. Those tools will work together to create a logical backup of your dataset and then load it back to the database. What’s important - they can utilize multiple threads which speeds up the process significantly compared to mysqldump. Mydumper needs to be compiled and it’s sometimes hard to get it to work. Recent versions became more stable though and we’ve been using it successfully.

Periodical health checks

Once you have all your tools set up, you need to establish a routine to check the health of the databases. How often you’d like to do it is up to you and your environment. For smaller setups daily checks may work. For larger setups you probably have to do it every week or so. The reasoning behind it is that such regular checks should enable you to act proactively and fix any issues before they actually happen. Of course, you will eventually develop your own pattern but here are some tips on what you may want to look at.

First of all, the graphs! This is one of the reasons a trending system is so useful. While looking at the graphs, you want to ensure no anomalies happened since the last check. If you noticed any kind of spikes, drops or, in general, unusual patterns, you probably want to investigate further to understand what happened exactly. It’s especially true if the pattern is not healthy and may be the cause (or result) of a temporary slowdown of the system.

You want to look at the MySQL internals and host stats. Most important graphs would be the ones covering number of queries per second, handler statistics (which gives you information about how MySQL accesses rows), number of connections, number of running connections, I/O operations within InnoDB, data about row and table level locking. Additionally, you’re interested in all data on host level - CPU utilization, disk throughput, memory utilization, network traffic. See the “Related resources” section at the end of this post for a list of relevant blogs around monitoring metrics and their meaning. At first, such a check may take a while but once you get familiar with your workload and its patterns, you won’t need as much time as at the beginning.

Another important part of the health check is going over the health of your backups. Of course, you might have backups scheduled. Still, you need to make sure that the process works correctly, your backups are actually running and backup files are created. We are not talking here about recovery tests (such tests should be performed but it’s not really required to do it daily or on weekly basis - on the other hand, if you can afford to do it, it’s even better). What we are talking here is more about simple checks. If the backup file was created, does it have the correct file size (if a data set has 100GB, then a 64KB backup file may be suspicious)? Has it been even created in the first place? If you use compression and you have some disk space free, you may want to try and decompress the archive to verify it’s correct (as long as it’s feasible in terms of the time needed for decompression). How’s the disk space status? Do you have enough free disk on your backup server? If you copy backups to a remote site for DR purposes, the same set of checks apply for the DR site.

Finally, you probably want to look at the system and MySQL logs - check kernel log for any symptoms of hardware failure (disk or memory sometimes send warning messages before they fail), check MySQL’s error log to ensure nothing wrong is going on.

As we mentioned before, the whole process may take a while, especially at the beginning. Especially the graph overview part may take time, as it’s not really possible to automate it - the rest of the processes is rather straightforward to script. With a growing number of MySQL servers, you will probably have to relax the frequency between checks due to time needed to perform a health check - maybe you’ll need to prioritize, and cover the less important parts of your infrastructure every other week?

Such healthcheck is a really useful tool for a DBA. We mentioned already that it helps to proactively fix errors before they start to create an issue, but there’s one more important reason. You might not always up to date when it comes to the new code that has been introduced in production. In the ideal world, each SQL code would have been reviewed by a DBA. In the real world, though, it’s rather uncommon. As a result of that, it may happen that the DBA is surprised by new workload patterns that start showing up. The good thing is that, once they are spotted, the DBA can work with developers to fix or optimize schemas and SQL queries. Health checks are one of the best tools to catch up on such changes - without them, a DBA would not be aware of bad code or database design that may eventually lead to a system outage.

We hope this short introduction will give you some information on how you may want to setup your environment and what tools you may want to use. We also hope that the first health checks will give you a good understanding of your system’s performance and help you understand any pain points that may already be there. In our next post, we will cover backups.

Related resources

Blog category:

DB Ops

Tags:

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

High availability using MySQL in the cloud

June 4, 2015, 12:00 am

≫ Next: Fast Galera Cluster Deployments in the Cloud Using Juju

≪ Previous: Become a DBA blog series - Monitoring and Trending

Next Wednesday (June 10) I’ll be co-presenting a webinar on using MySQL in the cloud for High Availability (HA). Joining me will be 451 Research analyst Jason Stamper and together we’ll talk about the realities of HA using MySQL in the cloud and how vendors are responding to changing application requirements with new developments that can enhance your deployment.

We’ll also present a comparison of available solutions along with key best practices you can follow for successfully attaining HA in the cloud with MySQL. The webinar is scheduled for June 10 at 10 a.m. Pacific. Register here.

Together we’ll cover:

What do HA MySQL deployments in the cloud look like today?
What are the developing requirements of applications based on future growth and scalability needs?
How are key vendors responding to these needs with new features and solution offerings, including those from OpenStack, Amazon, and others?
A high level, more technical comparison of the solutions
Keys to a successful HA MySQL deployment, including scaling from a single-node application to a cluster of MySQL instances

At the end of this webinar, you will have a good understanding of the options available for High Availability using MySQL in the cloud and how your current HA MySQL deployment in the cloud compares. You’ll also learn the tradeoffs you face depending on your HA solution and be able to identify which vendors and technologies are best suited for your needs.

This webinar, as usual, is free. Register now to reserve your spot and I hope to see you next Wednesday!

The post High availability using MySQL in the cloud appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Fast Galera Cluster Deployments in the Cloud Using Juju

June 4, 2015, 2:29 am

≫ Next: The InnoDB Change Buffer

≪ Previous: High availability using MySQL in the cloud

Introduction

The Galera Cluster Juju Charm was recently released, and it is now possible to start scalable Galera Clusters using the Juju deployment framework on the public or private cloud (OpenStack, Amazon, Azure and bare metal are all supported). All the logic required to fire up Galera is encapsulated in the Charm, which is a small package of scripts and configuration files that is automatically downloaded and added to your environment.

Installing and Configuring Juju

The Juju client is available for Ubuntu, OSX and Windows. Installing it is a matter of adding its dedicated package repository:
$ sudo add-apt-repository ppa:juju/stable $ sudo apt-get update && sudo apt-get install juju-core
Then run
$ juju generate-config
in order to create the Juju configuration file, ~/.juju/environments.yaml, which you can then edit for your particular cloud environment and provide your cloud authentication credentials.

You can now bootstrap Juju. This will create a single machine instance that is used to control all future instances started using Juju:

$ juju bootstrap

Deploying Galera using Juju

Once you have Juju set up, using it to deploy Galera Cluster becomes truly a one-liner. You only need a configuration file with your desired MySQL root password and SST password, as seen in the following template:

galera-cluster:
    root-password: my-root-password
    sst-password: my-sst-password

The SST password serves as a shared secret when transferring entire snapshots of the database from one node to another.

We are now ready to start our first node:

$ juju deploy --config galera.yaml cs:galera-cluster
Added charm "cs:~charmers/trusty/galera-cluster-4" to the environment.

juju status shows us what this command did:

$ juju status
environment: amazon
machines:
  "0":
    agent-state: started
    agent-version: 1.22.3
    dns-name: 54.91.110.84
    instance-id: i-73516e8e
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-zone=us-east-1a
    state-server-member-status: has-vote
  "1":
    agent-state: pending
    instance-id: i-f2db1d0d
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-zone=us-east-1b
services:
  galera-cluster:
    charm: cs:~codership/trusty/galera-cluster-4
    exposed: false
    relations:
      cluster:
      - galera-cluster
    units:
      galera-cluster/0:
        agent-state: allocating
        machine: "1"

Juju has started a cloud server instance, i-f2db1d0d and began deploying the first Galera node. In a bit, the Juju status changes to fully started and running:

$ juju status galera-cluster
environment: amazon
machines:
  "1":
    agent-state: started
    agent-version: 1.22.3
    dns-name: 54.162.183.83
    instance-id: i-f2db1d0d
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-zone=us-east-1b
services:
  galera-cluster:
    charm: cs:~codership/trusty/galera-cluster-4
    exposed: false
    relations:
      cluster:
      - galera-cluster
    units:
      galera-cluster/0:
        agent-state: started
        agent-version: 1.22.3
        machine: "1"
        public-address: 54.162.183.83

Now that our first Galera Cluster node has started, we can add another one with a single one-line command:

$ juju add-unit galera-cluster

and in a short while Juju will report that this node is also running:

$ juju status --format short galera-cluster

- galera-cluster/0: 54.162.183.83 (started)
- galera-cluster/1: 54.147.2.106 (started)

The cluster now has two nodes:

$ juju ssh galera-cluster/0 mysql -uroot -pmy-root-password

mysql> show status like '%wsrep_cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+
1 row in set (0.00 sec)

Accessing and Exposing the Juju-started Galera Cluster

Juju charms are meant to be chained together to other applications that would require their services. For example, a Mediawiki charm that requires a MySQL database can be hooked to the Galera Cluster charm and Juju will internally connect them so that Mediawiki is automatically given the MySQL connection parameters it needs to start using the database. The Juju Documentation has more information about service relationships.

It is also possible to give external applications access to the Galera Cluster. First, we instruct Juju to open the MySQL connection port to external users:

juju expose galera-cluster

and then we can create MySQL users:

$ juju ssh galera-cluster/0 mysql -uroot -pmy-root-password

mysql> CREATE USER 'my-user'@'my-host' IDENTIFIED BY 'my-password';

Summary

As we can see, we only had to provide a few configuration variables and run a few commands to have a fully functional Galera Cluster that can now be scaled with additional nodes or connected to other applications that require a fault-tolerant MySQL database. Juju and the Galera Cluster Juju Charm encapsulate all the logic that is required to start Galera Clusters on machine instances in a cloud provider.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The InnoDB Change Buffer

June 4, 2015, 4:16 am

≫ Next: MariaDB 10.1.5 now available

≪ Previous: Fast Galera Cluster Deployments in the Cloud Using Juju

One of the challenges in storage engine design is random I/O during a write operation. In InnoDB, a table will have one clustered index and zero or more secondary indexes. Each of these indexes is a B-tree. When a record is inserted into a table, the record is first inserted into clustered index and then into each of the secondary indexes. So, the resulting I/O operation will be randomly distributed across the disk. The I/O pattern is similarly random for update and delete operations. To mitigate this problem, the InnoDB storage engine uses a special data structure called the change buffer (previously known as the insert buffer, which is while you will see ibuf and IBUF used for various internal names).

The change buffer is another B-tree, with the ability to hold the record of any secondary index. It is also referred to as a universal tree in the source code. There is only one change buffer within InnoDB and it is persisted in the system tablespace. The root page of this change buffer tree is fixed at FSP_IBUF_TREE_ROOT_PAGE_NO (which is equal to 4) in the system tablespace (which has space id of 0). When the server is started, the change buffer tree is loaded by making use of this fixed page number. You can refer to the ibuf_init_at_db_start() function for further details.

The total size of the change buffer is configurable and is designed to ensure that the complete change buffer tree can reside in main memory. The size of the change buffer is configured using the innodb_change_buffer_max_size system variable.

Overview of Change Buffering

Change buffering is applicable only to non-unique secondary indexes (NUSI). InnoDB buffers 3 types of operations on NUSI: insert, delete marking, and delete. These operations are enumerated by ibuf_op_t within InnoDB:

/* Possible operations buffered in the insert/whatever buffer. See
ibuf_insert(). DO NOT CHANGE THE VALUES OF THESE, THEY ARE STORED ON DISK. */
typedef enum {
        IBUF_OP_INSERT = 0,
        IBUF_OP_DELETE_MARK = 1,
        IBUF_OP_DELETE = 2,

        /* Number of different operation types. */
        IBUF_OP_COUNT = 3
} ibuf_op_t;

One important point to remember is that the change buffering is leaf page oriented. A particular operation to NUSI is buffered in the change buffer only if the relevant non-root leaf page of the NUSI is not already available in the buffer pool. This means that the buffered change is predefined to happen in a particular leaf page of a NUSI within the InnoDB system. This makes it necessary to track the free space available in the NUSI leaf pages. This tracking is necessary because merging these buffered operations to the NUSI leaf page must not result in a B-tree page split or B-tree page merge.

Special Change Buffer Fields

When a NUSI record is buffered in the change buffer, 4 special change buffer fields are added to the beginnning of the NUSI record. Each of these 4 fields and their contents are explained below. The primary key of the change buffer tree is then {space_id, page_no, count}, where the count helps to maintain the order in which the change is buffered for that particular page. The change buffer row format has evolved over a period of time and the following table provides information for MySQL 5.5+:

Field Number	Macro Name	Field Length	Field Description
1	IBUF_REC_FIELD_SPACE	4 bytes	The identifier of the space in which NUSI exists.
2	IBUF_REC_FIELD_MARKER	1 bytes	Indicates whether the change buffer row format is old or new. If present (and zero), row format is newer (MySQL 4.1+)
3	IBUF_REC_FIELD_PAGE	4 bytes	The leaf page number of the NUSI to which the buffered row belongs.
4	IBUF_REC_FIELD_METADATA	2 bytes	Counter field, used to sort records within a (space_id, page_no) in the order they were added.
		1 byte	Operation type (ibuf_op_t).
		1 byte	Row format flag. If 0, the user index record is in REDUNDANT row format. If IBUF_REC_COMPACT, then user index record is in COMPACT row format.
		Variable length. DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE (which is 6) bytes per field.	Type information affecting the alphabetical ordering of the fields and the storage size of an SQL NULL value.
5	IBUF_REC_FIELD_USER		the first user field.

The row format of the change buffer records themselves are always REDUNDANT.

Change Buffer Bitmap Page

The free space information for each page is tracked in predefined pages called the change buffer bitmap page, which is also known as the ibuf bitmap page. These pages always follow the extent descriptor pages. The following table gives the predefined page numbers of the change buffer bitmap pages:

Page Size in Kilobytes (KB)	Extent Size in Pages	Extent Descriptor Page Numbers	Change Buffer Bitmap Pages (ibuf bitmap pages)	Number of pages described by one ibuf bitmap page
4	256	0, 4096, 8192, 12288, ...	1, 4097, 8193, 12289, ...	4096
8	128	0, 8192,16384, 24576, ...	1, 8193, 16385, 24577, ...	8192
16	64	0, 16384, 32768, 49152, ...	1, 16385, 32769, 49153, ...	16384
32	64	0, 32768, 65536, ...	1, 32769, 65537, ...	32768
64	64	0, 65536, 131072, ...	1, 65537, 131073, ...	65536

The page number 1 is also referred to as the FSP_IBUF_BITMAP_OFFSET. These change buffer bitmap pages help to answer the following questions quickly:

Does the given page have any buffered changes in the ibuf (insert/change buffer) tree? This question will be asked when a page is read into the buffer pool. The buffered changes will be merged to the actual page before putting it into the buffer pool.
Does the given page have enough free space so that a change can be buffered? This question will be asked when we want to modify a leaf page of NUSI and it is not already available in the buffer pool.

In the next section we will see at the information stored in the ibuf bitmap page which helps to answer above questions.

Information Stored in Change Buffer Bitmap Page

The change buffer bitmap page uses 4 bits (IBUF_BITS_PER_PAGE) to describe each page. It contains an array of such 4 bits describing each of the pages. This whole array is called the “ibuf bitmap” (insert/change buffer bitmap). This array begins after the page header at an offset equal to IBUF_BITMAP (which is equal to 94). Given a page number, the ibuf bitmap page that contains the 4 bit information on the given page can be calculated as:

ulint bitmap_page_no = FSP_IBUF_BITMAP_OFFSET + ((page_no / page_size) * page_size);

You can refer to the ibuf_bitmap_page_no_calc() function for more details on the complete calculation. Likewise, given a page number, the offset within the change buffer bitmap page contains the 4 bits that can be used easily for the calculations. I leave this as an exercise to the reader (refer to function ibuf_bitmap_page_get_bits_low() for further info). The following table provides details about these 4 bits:

Bit	Value	Description
IBUF_BITMAP_FREE	0	The first two bits are used to represent the free space available in the leaf page of the NUSI.
IBUF_BITMAP_BUFFERED	2	The third bit if set means that the leaf page has buffered entries in the change buffer.
IBUF_BITMAP_IBUF	3	The fourth bit if set means that this page is part of the change buffer.

This means that only 2 bits are available to store the free space information of the page. There are only 4 possible values: 0, 1, 2, 3. Using these 2 bits, we try to encode the free space information for a page. The rule is as follows — there must be at least UNIV_PAGE_SIZE / IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for the change buffer to be used:

/** An index page must contain at least UNIV_PAGE_SIZE /
IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for ibuf to try to
buffer inserts to this page.  If there is this much of free space, the
corresponding bits are set in the ibuf bitmap. */
#define IBUF_PAGE_SIZE_PER_FREE_SPACE   32

Tracking Free Space of Pages

Before an insert operation (IBUF_OP_INSERT) is buffered, the free space available in the target NUSI leaf page is approximately calculated using the information available in the change buffer bitmap page. This conversion is done in the ibuf_index_page_calc_free_from_bits() function and the formula used is:

if (ibuf_code == 3) {
    ibuf_code = 4;
}
free_space = ibuf_code * (page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE);

The following table provides the conversions done from the encoded value found within the change buffer bitmap page to a meaningful value in bytes:

Page Size in Kilo Bytes (KB)	Free Space Information from IBUF bitmap page (ibuf_code)	The free space available in NUSI leaf page (approximately assumed)
4	0	0 bytes
	1	128 bytes
	2	256 bytes
	3	512 bytes
16	0	0 bytes
	1	512 bytes
	2	1024 bytes
	3	2048 bytes

Using this information, we can determine if the record to be buffered will fit into the page or not. If there is enough space then the insert will be buffered. Using this approach, we ensure that merging these records to the target NUSI will not result in a page split.

Updating Free Space Information

After buffering an insert or delete operation, the free space information in the change buffer bitmap page must be updated accordingly (a delete mark operation will not change the free space information). To update the free space information we need to convert the free space in bytes back to the IBUF encoded value. This is done in the ibuf_index_page_calc_free_bits() function using the following formula:

ibuf_code = max_ins_size / (page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE);
if (ibuf_code == 3) {
     ibuf_code = 2;
}
if (ibuf_code > 3) {
     ibuf_code = 3;
}

In the above formula, max_ins_size is the maximum insert size (maximum free space) available in the page after page re-organization.

Record Count in NUSI Leaf Pages

In the case of a purge operation (IBUF_OP_DELETE), we need to ensure that the number of records in the target NUSI leaf page doesn’t go to zero because in InnoDB, the leaf pages of B-tree are not allowed to become empty. They must have at least 1 record. Since the number of records in the target NUSI page is unknown (because it is not loaded into the buffer pool yet), the buffered insert operations are taken into account. If 1 insert operation is buffered then it is assumed that the target NUSI page has 1 record, and if 2 insert operations are buffered then it is assumed that the target NUSI page has 2 records and so on. In this way the number of records in the target NUSI page is calculated. Based on this calculated record count, a purge operation is either buffered or not. This means that if there are no buffered insert or delete-mark operations, then no purge operations can be buffered.

Merging the Buffered Changes to Target NUSI Page

The changes to NUSI leaf pages are buffered in the change buffer if the NUSI leaf page is not already in the buffer pool. These buffered operations are merged back into the actual NUSI leaf page under various circumstances:

When these NUSI leaf pages are subsequently read into the buffer pool, the buffered operations will be merged. A NUSI leaf page could be read into the buffer pool during an index lookup, an index scan or because of read ahead.
When the master thread of InnoDB periodically does change buffer merges by calling ibuf_merge_in_background().
When there are too many operations buffered for a particular NUSI leaf page.
When the change buffer tree reaches its maximum allowed size.

The change buffer merge operation is initiated by calling the ibuf_merge_in_background() or ibuf_contract() functions. The change buffer merges are done either in the foreground or in the background. The foreground change buffer merges are done as part of a DML operation and hence will affect the performance experienced by the end user. The background change buffer merges are instead done periodically when there is less activity in the server.

We ensure that the change buffer merge does not result in a B-tree page split or page merge operation. It also shouldn’t result in an empty leaf page. Before the target NUSI leaf page are placed into the buffer pool, the buffered changes are applied to them. Once the buffered changes are merged for a page, its associated 4 bits of information in the change buffer bitmap page are also updated.

Conclusion

This article provided an overview of the change buffer subsystem of InnoDB. It explained the additional fields that are added to a secondary index record before storing it within the change buffer. It provided information about how the change buffer keeps track of free space information of the NUSI leaf pages by making use of the change buffer bitmap page. It also explained the need to calculate the number of records in the NUSI leaf pages so that it doesn’t become empty.

Thanks to Marko Makela for reviewing this article and helping to make it more accurate. If you have any questions, please feel free to post a comment below.

That’s all for now. THANK YOU for using MySQL!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MariaDB 10.1.5 now available

June 4, 2015, 10:49 am

≫ Next: Network Analyzer for Redis

≪ Previous: The InnoDB Change Buffer

Download MariaDB 10.1.5

Release Notes Changelog What is MariaDB 10.1?

MariaDB APT and YUM Repository Configuration Generator

mariadb-seal-shaded-browntext-alt The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.5. This is a Beta release.

See the Release Notes and Changelog for detailed information on this release and the What is MariaDB 10.1? page in the MariaDB Knowledge Base for general information about the MariaDB 10.1 series.

Thanks, and enjoy MariaDB!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Network Analyzer for Redis

June 4, 2015, 5:00 pm

≫ Next: Network Analyzer for MongoDB

≪ Previous: MariaDB 10.1.5 now available

VividCortex’s network traffic analyzer tool for Redis is an easy-to-use, non-intrusive way to gain insight into your server’s activity. Built for MongoDB servers running on Linux operating systems, it will help you understand your query workload.

This commandline tool captures TCP traffic on your server and decodes the protocol. It decodes queries and outputs them in a standard log format. You can use standard log analysis tools such as Percona Toolkit’s pt-query-digest to analyze the output and build insight into queries and server performance.

The tool is built on VividCortex’s advanced network traffic capture technology and is also a safe way to assess how VividCortex’s agents will behave on your systems. It is a thin wrapper around our TCP and Redis decoding libraries. It does nothing but decode and print, and makes no attempt to communicate with the Internet or anything else. Simple, secure, and smart.

System requirements:

64-bit Linux operating system
Commandline access to the server with root privileges

We offer free tools for analyzing the following network protocols: MySQL, MongoDB, PostgreSQL, and Redis.

Fill out the form below to receive your free download.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Network Analyzer for MongoDB

June 4, 2015, 5:00 pm

≫ Next: Orchestrator visual cheatsheet

≪ Previous: Network Analyzer for Redis

VividCortex’s network traffic analyzer tool for MongoDB is an easy-to-use, non-intrusive way to gain insight into your server’s activity. Built for MongoDB servers running on Linux operating systems, it will help you understand your query workload.

This commandline tool captures TCP traffic on your server and decodes the protocol. It decodes and times queries, and outputs them in a standard log format. You can use standard log analysis tools such as Percona Toolkit’s pt-query-digest to analyze the output and build insight into queries and server performance.

The tool is built on VividCortex’s advanced network traffic capture technology and is also a safe way to assess how VividCortex’s agents will behave on your systems. It is a thin wrapper around our TCP and MongoDB decoding libraries. It does nothing but decode and print, and makes no attempt to communicate with the Internet or anything else. Simple, secure, and smart.

System requirements:

64-bit Linux operating system
Commandline access to the server with root privileges

This tool supports decoding the MongoDB protocol.

We offer free tools for analyzing the following network protocols: MySQL, MongoDB, PostgreSQL, and Redis.

Fill out the form below to receive your free download.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Orchestrator visual cheatsheet

June 5, 2015, 5:19 am

≫ Next: Blog Roundup: Oracle, SQL Server, MySQL

≪ Previous: Network Analyzer for MongoDB

Orchestrator is growing. Supporting automatic detection of topologies, simple refactoring of topology trees, complex refactoring via Pseudo-GTID, failure detection and automated discovery, it is becoming larger and larger by the day.

One of the problems with growign projects is hwo to properly document them. Orchestrator enjoys a comprehensive manual, but as these get more and more detailed, it is becoming difficult to get oriented and pointed in the right direction. I've done my best to advise the simple use cases throughout the manual.

One thing that is difficult to put into words is topologies. Explaining "failover of an intermediate master S1 that has S2,...,Sn slaves onto a sibling of S1 provided that..." is too verbose. So here's a quick visual cheatsheet for (current) topology refactoring commands. Refactoring commands are a mere subset of overall orchestrator commands, but they're great to play with and perfect for visualization.

The "move" and related commands use normal replication commands (STOP SLAVE; CHANGE MASTER TO; START SLAVE UNTIL;"...).

The "match" and related commands utilize Pseudo-GTID and use more elaborate MySQL commands (SHOW BINLOG EVENTS, SHOW RELAYLOG EVENTS).

So without further ado, here's what each command does (and do run "orchestrator" from the command line to get a man-like explanation of everything, or just go to the manual).

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Blog Roundup: Oracle, SQL Server, MySQL

June 5, 2015, 7:57 am

≫ Next: Parsing the Redis TCP Protocol

≪ Previous: Orchestrator visual cheatsheet

This Log Buffer edition transcends beyond ordinary and loop in few of the very good blog posts from Oracle, SQL Server and MySQL.

Oracle:

Variable selection also known as feature or attribute selection is an important technique for data mining and predictive analytics.
The Oracle Utilities SDK V4.3.0.0.2 has been released and is available from My Oracle Support for download.
This article provides a high level list of the new features that exist in HFM 11.1.2.4 and details the changes/differences between HFM 11.1.2.4 and previous releases.
In recent years, we’ve seen increasing interest from small-to-mid-sized carriers in transforming their policy administration systems (PAS).
Got a question on how easy it is to use ORDS to perform insert | update | delete on a table?

SQL Server:

The Importance of Database Indexing
Stairway to SQL Server Security Level 9: Transparent Data Encryption
Query Folding in Power Query to Improve Performance
Writing Better T-SQL: Top-Down Design May Not be the Best Choice
Cybercrime – the Dark Edges of the Internet

MySQL:

One of the challenges in storage engine design is random I/O during a write operation.
Fast Galera Cluster Deployments in the Cloud Using Juju
High availability using MySQL in the cloud
Become a DBA blog series – Monitoring and Trending
MySQL as an Oracle DBA

Learn more about Pythian’s expertise in Oracle , SQL Server & MySQL, as well as the author Fahd Mirza.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Parsing the Redis TCP Protocol

June 4, 2015, 5:00 pm

≫ Next: How to backup MySQL to your own storage servers

≪ Previous: Blog Roundup: Oracle, SQL Server, MySQL

When we decided to go beyond just MySQL monitoring, we had a couple of natural next choices. The decision involved engineering effort, the likelihood we’d find MySQL-specific things in our system that would slip our schedule, alignment with existing customers, and the commercial opportunity.

We thought that Redis monitoring would be a relatively small sales opportunity at present (although the community is very large and active), but would be simple to support because of its simple wire protocol and Redis’s straightforward nature: single threaded, no query execution plans, etc. It turns out we were wrong about the ease of implementation. Redis’s protocol is hard to capture on the wire and inspect precisely because of its simplicity. There are some interesting lessons learned.

Shell

To set the context, if you’re not familiar with VividCortex, we do database performance monitoring. Most of our customers buy our product because we go beyond status counters and actually measure queries. We sniff the queries off the network with libpcap (the same library tcpdump uses) and reassemble the TCP stream. Then we decode the network protocol and extract the queries and their responses, correlate them together, and categorize them by the abstract of the query text. We can generate lots of interesting observations from this: query timing, frequency, errors, protocol flags, and so on.

Why Redis’s TCP Traffic Is Hard To Sniff

The hard part with Redis is correlating the queries (commands) with the responses from the server.

This is hard because Redis’s protocol allows pipelining. A client can send many commands without needing to wait for the server to reply to previous commands. When the server does reply, the responses come in the same order as the original commands were sent, but they are not otherwise labeled as belonging to any specific command.

The client needs to figure out which response belongs to which command by keeping a FIFO queue of commands it sent. When a response comes back, it belongs to the oldest pending command.

This is kind of a nightmare for TCP sniffing. There are at least two obvious cases where we will be unable to figure out the correlation between commands and responses:

We start observing in the middle of a conversation. Commands have been sent but we didn’t see them.
We don’t see some packets in the conversation. This happens when the packet rate is high and packets are dropped from buffers before libpcap can observe them.

This is made even more challenging by the fact that Redis commands are typically very fast and the packet and command rate are usually very high as a result. Whatever we do has to be extremely efficient.

How We Do It

In this environment, how do we observe the Redis protocol and measure command latencies?

Best effort and approximation.

In more detail: there are some cases that can be handled and others can’t, and some are a gray area:

No pipelining in use. In this case, if we see a command and a response, we subtract the timings and we’ll be correct, more or less.
We don’t see the pipelining. We might not be able to measure timings for some commands as a result.
Pipelining in use. We cannot be sure we’ve seen all the requests and responses, so we take a middle road and apply a heuristic that represents our best effort as to which responses go with which commands, and their timings.

There’s another factor at play, too. Timings seen from network packet capture are approximate anyway, because there are layers of buffers and delays in between where we’re observing and what a client or a server daemon sees. There are delays between when packets arrive on the interface and when the OS delivers them to Redis, for example. Likewise, when Redis writes out a response, the OS might delay sending it over the network.

At very high packet rates and very low request latencies, these uncertainties add up to a greater portion of the real server response times, so the relative noisiness is greater.

Results

As a result, the timings of our Redis queries have some wiggle room. The error is usually skewed towards the short end – if there’s a timing error, we measure queries as being faster than they might have been.

However, this imprecision is worlds better than not having any visibility at all into Redis queries and their latencies. And for longer commands (perhaps a big operation over a very large list/set/hash) the latencies will have a smaller fractional error.

And that, in the end, is what we really want to find out. There’s an assumption that “Redis is fast, for every operation, all the time.” But what if it isn’t? My entire career has followed this recipe:

Notice an assumption about something unmeasured, that is never noticed or questioned.
Find a way to see if it’s true or false. Apply that method.
Unsurprisingly, the assumption will turn out to be false. Every time.
Rinse and repeat.

And that’s what we’ve done with Redis. It turns out that, sometimes, Redis commands can be slow. Who would have thought? Witness the usual behavior of Redis on our own systems:

Redis Latency

And the occasional outliers:

Redis Latency

If you’d like to see how your own systems stack up, VividCortex integrates Redis queries right into the overall Top Queries reports we generate, so you can see your entire application’s workload across all of your servers of all types, and then drill into them quickly. For example, here’s Top Queries by count on our production systems.

Top Queries

Notice how heavily we depend on Redis relative to MySQL. I wrote about this in a recent High Scalability blog post discussing how we make our backend metrics storage and processing scale.

That’s the other reason we added support for Redis monitoring, by the way: it’s really important to us. Before we built this functionality, we were flying just as blind as everyone else, without visibility into our Redis servers’ query traffic and workload.

You Can Do This Too

If you’d like to take a look at your own Redis server performance and workload, there are two great options.

Our free Redis network traffic analyzer is a thin wrapper around our advanced Redis protocol parsing libraries. Analyze the results with pt-query-digest.
Use VividCortex (a superior approach in many ways). Sign up for a free trial.

Either way, let us know what you think of this functionality and we look forward to hearing you say, as many of our customers tell us:

One summary of VividCortex is "it will find things about your systems that you didn't know, and aren't sure you'd find out any other way."
— Baron Schwartz (@xaprb) June 4, 2015

Shell by Bill Gracey

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

How to backup MySQL to your own storage servers

June 5, 2015, 8:49 pm

≫ Next: 9 easy performance tips for your Linux environment

≪ Previous: Parsing the Redis TCP Protocol

When a user signs up for online backup service for MySQL we allocate two gigabytes of free disk space in our storage. It’s convenient for small databases. We manage the storage, no additional hardware is needed from the user. What if the database is large enough so 2 gigabytes is not enough?

TwinDB allows to backup MySQL to your own storage servers. The storage will be visible in TwinDB console, the dispatcher will set it as a destination server when scheduling backup jobs and will enforce retention policy.

But the biggest benefit of having your own storage server in TwinDB is that backup copies never leave your data center. The agents are still managed by TwinDB dispatcher, but when XtraBackup streams a backup copy it’s encrypted and piped over ssh to your server. That increases security of the solution – your data flow is under your control. Besides, no large amounts of data are transferred to the cloud.

The storage server can be a Linux machine, we support and distribute packages for RedHat and Debian based systems.

Backup copies will be saved in /var/twindb-sftp . I recommend to create a separate partition for backups:

# df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.8G 1.9G 5.8G 25% / devtmpfs 490M 64K 490M 1% /dev tmpfs 499M 0 499M 0% /dev/shm /dev/mapper/data-storage 99G 7.6G 86G 9% /var/twindb-sftp

The storage server must be accessible via TCP port 4194 for TwinDB agents and TwinDB dispatcher.

Now, to add your own storage server log in to https://console.twindb.com. Open a “Storage” item in the left menu and press “New” button

It will open a window with further instructions:

There are two steps actually:

Install twindb-server-storage package from TwinDB repository. We support RedHat/CentOS and Debian/Ubuntu. See TwinDB repository page for instructions.
Register the storage server. The exact command with your registration code will be shown in the window.

# twindb-register-storage 3690150723f6d732bc9c710ca68a8ec3 2015-06-06 03:20:58,503: twindb: INFO: register(): line 129: Registering TwinDB storage server with code 3690150723f6d732bc9c710ca68a8ec3 2015-06-06 03:20:58,568: twindb: INFO: register(): line 179: Received successful response to register the storage server. 2015-06-06 03:20:58,569: twindb: INFO: register(): line 180: The storage server successfully registered in TwinDB 2015-06-06 03:20:58,569: twindb: INFO: register(): line 182: Creating local user user_id_1 2015-06-06 03:20:59,305: twindb: INFO: confirm_storage_registration(): line 245: Received successful response from dispatcher 2015-06-06 03:20:59,305: twindb: INFO: register(): line 192: Success

That’s pretty much it. TwinDB will see the new storage server and will use it as a destination for further backup jobs. You can add as many storage servers as you need.

The post How to backup MySQL to your own storage servers appeared first on Backup and Data Recovery for MySQL.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

9 easy performance tips for your Linux environment

June 8, 2015, 3:49 am

≫ Next: MaxScale: A new tool to solve your MySQL scalability problems

≪ Previous: How to backup MySQL to your own storage servers

For the majority of us who have grown accustomed to a Windows environment over the years, Linux can seem like another world. In essence, Linux is a free open-source operating system that has gained increasing popularity since its release in 1991. Linux is based on the whole Unix ecosystem of operating systems that grew out of Bell Laboratories in the early 1970s. Linux has been around for almost 25 years and grew immensely in the late 1990s and early 2000s when it became associated with the LAMP web development stack; Linux stands for the ‘L’ in the acronym of popular tools, along with Apache, MySQL, and PHP/Perl/Python.

The main difference that any user will readily notice between Linux and Window is that the Linux server tends not to install the Graphical User Interface by default but instead leaves you with a command line interface. The command line environment is very different than just clicking on icons in Windows. Consider something like this:

#!/bin/bash

# My first script echo “Hello World!”

Or consider commands like “mkdir” for creating a directory, or “ls” for listing out the contents of one. Again, for traditional Windows-users, Linux will seem like a foreign language at first.

Despite an initial learning curve, the payoff is worthwhile as Linux is an extremely versatile and powerful platform. In fact, one of the reasons for its global popularity is that it can be used for many other purposes than merely an OS. Linux’s range of usages encompass web server or office intranet server, a CMS or CRS server, a file server serving files to windows and/or Linux users, a voice-over IP telephony server, mail or domain name server, data base server, as an infrastructure node in a Cloud computing configuration and much more.

As with any technology infrastructure, your Linux installation will require close attention to ensure you’re getting the best deal on performance. You’ll want to keep your environment running as smoothly and effectively as possible, and avoid any challenges to your business critical applications. In the following we’ve accumulated 9 easy performance tips that will help keep your Linux environment in tip-top shape. Read on!

1. Disable unnecessary components

Linux comes bundled with a number of components and background services which run on every server but that are not required. The problem is that these “extras” take away valuable RAM and CPU. The best place to disable them is the startup scripts that start these services at boot time. Disabling these services will free up memory and decrease startup time. Examples of features to review are some of the popular control panels, such as Cpanel, Plesk, Webmin, and phpMyAdmin. Disabling these software packages can free up as much as 120 MB of RAM on your system.

2. Keep up with system updates

Linux is an open source platform that offers a large number of distributions, such as Ubuntu, Fedora, CentOS, and Mint and more; the most popular version is Ubuntu. Whatever version of Linux you happen to be using, it’s important that you commit to keeping the software current and robust. New fixes and security patches are added in every release. So one best practice is to always upgrade to the latest stable version of whatever platform of Linux you prefer. This will ensure that you keep all your clients, services, and applications running as securely as possible. It’s also recommended that you always have a sandbox in place, where you can test updates and detect any potential issues prior to running in production mode.

3. Get rid of the GUI

The distinct feature of Linux is that it doesn’t require a GUI; rather, everything can be run from the command-line. For some folks this is intuitive whereas for others (such as Windows users) it’s a foreign concept. In any case, when you’re in Linux territory having no GUI can save CPU cycles and memory, not to mention circumvent possible security issues. In order to disable GUI, “init level” should be set to 3 (command line login), rather than 5 (graphical login). If a GUI is needed, it can always be started manually with startx.

4. Tune up your TCP

Keeping your TCP protocol optimized helps improve network throughput for applications that require frequent connectivity. It’s especially recommended that you use larger TCP Linux sizes for communications across wide-area networks with large bandwidth and long delay characteristics; this tweak helps to improve data transfer rates.

5. Optimize your virtualbox

A popular way of running Linux is within a virtual environment, using a VM like VirtualBox or VMware Player so that Linux/Ubuntu can run in a window on your existing Windows or Mac desktop. The advantages here are that VMs can be used to test guest OSs in a sandbox-like environment, but which don’t have to be compliant or come in contact with the host machine or hardware. Since using a “virtualbox” is just like running another host, there are a number of optimizations you’ll want to make to your preferred VM tool such as disabling unnecessary services, optimizing performance, lightening the load, and blocking advertising. For a more detailed review of VM optimizations and fixes for Linux, see this helpful article.

6. Check for proper configurations of MySQL and Apache

Your Linux environment does not run in isolation; other important integrated services such as MySQL and Apache should also be optimized in order to get more out of your Linux stack. For example, to increase accessible RAM (or allot more RAM to MySQL), it’s a good idea to adjust the MySQL cache sizes (depending on your needs and the size of the MySQL requests). The same holds for Apache. Checking the ‘StartServers’ and ‘MinSpareServers’ directives will tell you how much memory Apache utilizes. Adjusting these settings will help you save RAM by as much as 30-40%.

7. Learn the 5 basic Linux performance commands

There are 5 basic Linux commands that every user should know; they are: top, vmstat, iostat, free & sar. These offer various optics on everything from current uptime and system load to CPU usage to main memory stats. For a comprehensive overview of these commands, see this article here.

8. Review and clean up additional modules or features

Similar to the disabling of extra services mentioned above, you’ll want to review any other modules or features that may be taking resources away from your system memory. For example, review the configuration files for Apache and decide if FrontPage support or some of the other extra modules are required. Adjusting or even turning these modules off will help to save memory and improve overall speed within your Linux environment.

9. Leverage Monitis Linux Monitoring

If you’re looking for best-in-class web monitoring and performance tracking then you need to head over to Monitis. With its industry-leading global service, Monitis lets businesses monitor their network anytime and from anywhere, including website uptime monitoring, full page load and transaction monitoring, and web load testing. The benefits and takeaways here are peace of mind and less stress. When it comes to monitoring you Linux environment, Monitis has you covered. Monitis Smart Agent for Linux allows monitoring of processes, CPU, memory, and hard drive utilization on a Linux server. And while you’re at it, why not add to this service Monitis Load Monitoring? This feature allows you to set load average thresholds so you get alerted if your Linux machine reaches some critical load levels already preset by you.

When it comes to monitoring your business-critical applications, especially those reliant on Linux, you don’t want to shortchange yourself. Get the peace of mind you deserve by entrusting your business to a proven industry leader. Go to Monitis and sign up for a free trial today and let them help boost your bottom-line. You’ll be glad you did!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MaxScale: A new tool to solve your MySQL scalability problems

June 8, 2015, 6:00 am

≫ Next: Become a MySQL DBA blog series - Backup and Restore

≪ Previous: 9 easy performance tips for your Linux environment

Ever since MySQL replication has existed, people have dreamed of a good solution to automatically split read from write operations, sending the writes to the MySQL master and load balancing the reads over a set of MySQL slaves. While if at first it seems easy to solve, the reality is far more complex.

First, the tool needs to make sure it parses and analyses correctly all the forms of SQL MySQL supports in order to sort writes from reads, something that is not as easy as it seems. Second, it needs to take into account if a session is in a transaction or not.

While in a transaction, the default transaction isolation level in InnoDB, Repeatable-read, and the MVCC framework insure that you’ll get a consistent view for the duration of the transaction. That means all statements executed inside a transaction must run on the master but, when the transaction commits or rollbacks, the following select statements on the session can be again load balanced to the slaves, if the session is in autocommit mode of course.

Then, what do you do with sessions that set variables? Do you restrict those sessions to the master or you replay them to the slave? If you replay the set variable commands, you need to associate the client connection to a set of MySQL backend connections, made of at least a master and a slave. What about temporary objects like with “create temporary table…”? How do you deal when a slave lags behind or what if worse, replication is broken? Those are just a few of the challenges you face when you want to build a tool to perform read/write splitting.

Over the last few years, a few products have tried to tackle the read/write split challenge. The MySQL_proxy was the first attempt I am aware of at solving this problem but it ended up with many limitations. ScaleARC does a much better job and is very usable but it stills has some limitations. The latest contender is MaxScale from MariaDB and this post is a road story of my first implementation of MaxScale for a customer.

Let me first introduce what is MaxScale exactly. MaxScale is an open source project, developed by MariaDB, that aims to be a modular proxy for MySQL. Most of the functionality in MaxScale is implemented as modules, which includes for example, modules for the MySQL protocol, client side and server side.

Other families of available modules are routers, monitors and filters. Routers are used to determine where to send a query, Read/Write splitting is accomplished by the readwritesplit router. The readwritesplit router uses an embedded MySQL server to parse the queries… quite clever and hard to beat in term of query parsing.

There are other routers available, the readconnrouter is basically a round-robin load balancer with optional weights, the schemarouter is a way to shard your data by schema and the binlog router is useful to manage a large number of slaves (have a look at Booking.com’s Jean-François Gagné’s talk at PLMCE15 to see how it can be used).

Monitors are modules that maintain information about the backend MySQL servers. There are monitors for a replicating setup, for Galera and for NDB cluster. Finally, the filters are modules that can be inserted in the software stack to manipulate the queries and the resultsets. All those modules have well defined APIs and thus, writing a custom module is rather easy, even for a non-developer like me, basic C skills are needed though. All event handling in MaxScale uses epoll and it supports multiple threads.

Over the last few months I worked with a customer having a challenging problem. On a PXC cluster, they have more than 30k queries/s and because of their write pattern and to avoid certification issues, they want to have the possibility to write to a single node and to load balance the reads. The application is not able to do the Read/Write splitting so, without a tool to do the splitting, only one node can be used for all the traffic. Of course, to make things easy, they use a lot of Java code that set tons of sessions variables. Furthermore, for ISO 27001 compliance, they want to be able to log all the queries for security analysis (and also for performance analysis, why not?). So, high query rate, Read/Write splitting and full query logging, like I said a challenging problem.

We experimented with a few solutions. One was a hardware load balancer that failed miserably – the implementation was just too simple, using only regular expressions. Another solution we tried was ScaleArc but it needed many rules to whitelist the set session variables and to repeat them to multiple servers. ScaleArc could have done the job but all the rules increases the CPU load and the cost is per CPU. The queries could have been sent to rsyslog and aggregated for analysis.

Finally, the HA implementation is rather minimalist and we had some issues with it. Then, we tried MaxScale. At the time, it was not GA and was (is still) young. Nevertheless, I wrote a query logging filter module to send all the queries to a Kafka cluster and we gave it a try. Kafka is extremely well suited to record a large flow of queries like that. In fact, at 30k qps, the 3 Kafka nodes are barely moving with cpu under 5% of one core. Although we encountered some issues, remember MaxScale is very young, it appeared to be the solution with the best potential and so we moved forward.

The folks at MariaDB behind MaxScale have been very responsive to the problems we encountered and we finally got to a very usable point and the test in the pilot environment was successful. The solution is now been deployed in the staging environment and if all goes well, it will be in production soon. The following figure is simplified view of the internals of MaxScale as configured for the customer:

The blocks in the figure are nearly all defined in the configuration file. We define a TCP listener using the MySQL protocol (client side) which is linked with a router, either the readwritesplit router or the readconn router.

The first step when routing a query is to assign the backends. This is where the read/write splitting decision is made. Also, as part of the steps required to route a query, 2 filters are called, regexp (optional) and Genlog. The regexp filter may be used to hot patch a query and the Genlog filter is the logging filter I wrote for them. The Genlog filter will send a json string containing about what can be found in the MySQL general query log plus the execution time.

Authentication attempts are also logged but the process is not illustrated in the figure. A key point to note, the authentication information is cached by MaxScale and is refreshed upon authentication failure, the refresh process is throttled to avoid overloading the backend servers. The servers are continuously monitored, the interval is adjustable, and the server status are used when the decision to assign a backend for a query is done.

In term of HA, I wrote a simple Pacemaker resource agent for MaxScale that does a few fancy things like load balancing with IPTables (I’ll talk about that in future post). With Pacemaker, we have a full fledge HA solution with quorum and fencing on which we can rely.

Performance wise, it is very good – a single core in a virtual environment was able to read/write split and log to Kafka about 10k queries per second. Although MaxScale supports multiple threads, we are still using a single thread per process, simply because it yields a slightly higher throughput and the custom Pacemaker agent deals with the use of a clone set of MaxScale instances. Remember we started early using MaxScale and the beta versions were not dealing gracefully with threads so we built around multiple single threaded instances.

So, since a conclusion is needed, MaxScale has proven to be a very useful and flexible tool that allows to elaborate solutions to problems that were very hard to tackle before. In particular, if you need to perform read/write splitting, then, try MaxScale, it is best solution for that purpose I have found so far. Keep in touch, I’ll surely write other posts about MaxScale in the near future.

The post MaxScale: A new tool to solve your MySQL scalability problems appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Become a MySQL DBA blog series - Backup and Restore

June 8, 2015, 8:10 am

≫ Next: JSON and the MySQL Argonauts

≪ Previous: MaxScale: A new tool to solve your MySQL scalability problems

It is not uncommon that developers, network/system administrators, or DevOps folks with general backgrounds, find themselves in a DBA role at some point in their career. So, what does a DBA do? In the previous post, we covered monitoring and trending practices, as well as some popular tools that you might find handy in your day to day work.

We’ll continue this blog series with another basic but crucial DBA responsibility - taking backups of your data. Backup and restore is one of the most important aspects of database administration. If a database crashed and there was no way to recovery it, any resulting data loss might lead to devasting results to a business. One could argue that you can protect against crashes by replicating to multiple servers or data centers. But if it is an application error that propagates to all instances, or a human dropping a part of a database by mistake, you will probably need to restore from backup.

Different backup methodologies

There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups - logical and physical.

Logical backups contain data that is exported using SQL commands and stored in a file. This can be, for e.g., a set of SQL commands (INSERTs), that, when executed, will result in restoring a content of the database. It does not have to be SQL code, it can be anything that is restorable - you can as well use SELECT … INTO OUTFILE to generate a file with your database contents. With some modifications to the output file’s syntax, you can store your backup in CSV files.

Physical backups are copies of physical database files. Here, we would make a binary copy of a whole database by, for example, copying all of the files or by making a snapshot of a volume where data directory is located.

A logical backup is usually slower than a physical one, because of the overhead to execute SQL commands to get the data out and then to execute another set of SQL commands to get the data back into the database. This is a severe limitation that tends to prevent the logical backup from being a single backup method on large (high tens or hundreds of gigabytes) databases. On the other hand, a major advantage of the logical backup is the fact that, having all data in the SQL format, you can restore single rows.

Physical backups are not that flexible - while some of the methods make it possible to restore separate tables, you cannot go down to row level. On the other hand, this is a fastest way to backup and restore your database - you are limited only by the performance of your hardware - disk speed and network throughput will be the main limiting factor.

One more important concept, when it comes to the MySQL backup, is point in time recovery. A backup, whether logical or physical, takes place at a given time. This is not enough, you have to be able to restore your database to any point in time, also to a point which happened between the backups. In MySQL, the main way to handle point in time recovery is to use binary logs to replay the workload. With that in mind, a backup is not complete unless you make a copy of the binlogs along with it.

Logical backup methods

mysqldump

The most known method is definitely mysqldump, a CLI tool that enables the DBA to create an SQL dump of the database. Mysqldump is a single-threaded tool and this is its most significant drawback - performance is ok for small databases but it becomes quickly unacceptable if the data set grows to tens of gigabytes. If you plan to use mysqldump as a mean of taking backups, you need to keep a few things in mind. First, by default mysqldump doesn’t include routines and events in its output - you have to explicitly set --routines (-R) and --events (-E) flags. Second, if you want to take a consistent backup then things become tricky. As long as you use InnoDB only, you can use --single-transaction flag and you should be all set. You can also use --apply-slave-statements to get the CHANGE MASTER statements at the beginning of the dump if you plan to create a slave using the backup. If you have other non-transactional tables (MyISAM for example), then mysqldump will have to lock the whole database to ensure consistency. This is a serious drawback and may be one of the reasons why mysqldump won’t work for you.

By default, mysqldump creates a file where you’ll first find SQL to create the schema and then SQL to restore data. To have more flexibility, you may change this behavior and script the backup in such a way that it creates a schema dump first and then the rest of the data. Additionally, you may also want to script the backup process so that it stores separate tables in separate sql files. This will come in handy when you need to restore several rows or to compare current data with the previous day’s data. It’s all about the file size: separate dumps, created per table, will likely to be smaller and more manageable. E.g., in case you want to use a CLI tool to find a given row in the SQL file.

SELECT … INTO OUTFILE

This is more of a way how mysqldump works rather than a separate backup method, but it’s distinct enough to be included here. Mysqldump can be executed in a mode where, instead of SQL syntax, it will generate a backup in some other way. In general, its format is similar to CSV with a difference that the actual format can be defined by the user. By default, it is tab-separated instead of comma-separated.
This format is faster to load than SQL dump (you can use LOAD DATA INFILE to make it happen) but it is also harder to use to restore a single row. Most people probably don’t remember LOAD DATA INFILE syntax, while almost everybody can run SQL.

Mydumper/myloader

Those tools work in pair to overcome the main pain-point of mysqldump - single thread. Mydumper can be used to generate a backup of the data (and data only, you need also to use mysqldump --no-data to get a dump of the schema) and then load it. Both processes can use multiple threads. You can either split the workload per table or you can define a size of a chunk and then large tables will also be worked on by numerous threads. It’s still a logical backup so the process may still take a while. Based on numbers reported by different users, mydumper can load data up to 2-3 times faster. The process may still take days, though - depending on the database size, row size etc.

Even if the restore time is not acceptable for your data set, you still may be interested in mydumper because of periodical MySQL upgrades. For any major version upgrade (like 5.5 -> 5.6 or upcoming 5.6 -> 5.7), the recommended way for an upgrade is to perform a logical dump of the data and then load it back up. In such a case, time is not that crucial but it is still much better to finish the restore in 2-3 days using mydumper/myloader rather than 6 - 9 days using mysqldump.

Physical backup methods

xtrabackup

Percona’s xtrabackup is the backup method for MySQL. It is a tool that allows the DBA to take a (virtually) non-blocking snapshot of the InnoDB database. It works by copying the data files physically from one volume to another location. You can also stream the backup over the network, to a separate backup host where the backup will be stored. While copying the data, it keeps an eye on the InnoDB redo log and writes down any change that happened in the meantime. At the end, it executes FLUSH TABLES WITH READ LOCK (that’s why we used a word ‘virtually’) and finalizes the backup. Thanks to the last lock, the backup is consistent. If you use MyISAM tables, xtrabackup is more impacting as the non-transactional tables have to be copied over the network while FTWRL is in place - this, depending on the size of those tables, may take a while. During that time, no query will be executed on the host.

Restore is pretty simple - especially if you apply redo logs to the backup taken. Theoretically speaking, you could as well start MySQL without any further actions but then InnoDB recovery will have to be performed at the start. This process takes time. Preparing the backup first (by applying redo logs) can be done in its own time. When the backup needs to be (quickly) restored, you won’t have to go over this process. To speed up the backup preparing phase (using --apply-log) you may increase memory available for xtrabackup using --use-memory flag. As long as you have several gigabytes of free memory, you can use them here to speed up the process significantly.

Xtrabackup is probably the most popular tool out there and it’s not without reason. It is very flexible, you can use multiple threads to copy the files quicker (as long as your hardware permits it), you can use compression to minimize size of the backup. As we mentioned, it is possible to create a backup locally or stream it over the network using (for example) an SSH tunnel or netcat. Xtrabackup allows you to create incremental backups which take significantly less disk space than the full one and won’t take as much time. When restoring, though, it is a slower process as deltas have to be applied one after another and it may take significant amount of time.

Another feature of xtrabackup is its ability to backup single schemas or even tables. It has its uses but also limitations. First of all, it can be used to restore several rows that got dropped accidently. It is still a less efficient way of doing this than restoring that data from an SQL dump, as you’d have to create a separate host, restore the given table, dump missing rows and load them onto the production server - you cannot restore a whole table because you’ll be missing data that happened after the backup was taken. It is possible to work it out with binary logs but it will take too much time to be feasible. On the other hand, if a whole table or schema is missing, you should be able to restore that pretty easily.

The main advantage of the xtrabackup over logical backups is its speed - performance is limited by your disk or network throughput. On the other hand, its much harder to recover single rows from the database. The ideal use case for xtrabackup is to recover a whole host from scratch or provision a new server. It comes with options to store information about MySQL replication or Galera writeset replication along with the backup. This is very useful if you need to provision a new replication slave or a new node in a cluster.

Snapshots

We’ll be talking here about backing up MySQL using snapshots - it does not matter much how you are taking those snapshots. It can be either LVM installed on a host (using LVM is not an uncommon way of setting up MySQL servers) or it could be a “cloudish” snapshot - EBS snapshot or it’s equivalent in your environment. If you use SAN as a storage for your MySQL server and you can generate a snapshot of a volume, it also belongs here. We will focus mostly on the AWS, though - it’s the most popular cloud environment.

In general, snapshots are a great way of backing up any data - it is quick and while it adds some overhead, there are definitely more pros of this method than cons. The main problem with backing up MySQL using the snapshot is consistency - taking a snapshot on the server is comparable to a forced power off. If you run your MySQL server in full durability mode, you should be just fine. If not, it is possible that some of the transactions won’t make it to disk and, as a result, you will lose data. Of course, there are ways of dealing with this issue. First of all, you can change durability settings to more durable (SET GLOBAL innodb_flush_log_at_trx_commit=1, SET GLOBAL sync_binlog=1) prior to the snapshot and then revert back to the original settings after a snapshot has been started. This is the least impacting way of making sure your snapshot is consistent. Another method include stopping a slave (if the replication is the only means of modifying data on a given host) and then run FLUSH TABLES. You can also stop the activity by using FLUSH TABLES WITH READ LOCK to get a consistent state of the database. What is important to keep in mind, though, is that no matter which approach you take, you will end up with data in “crashed” state - if you’d like to use this data to create a new MySQL server, at the first start MySQL will have to perform recovery procedures on InnoDB tables. InnoDB recovery, on the other hand, may take a while, hours even - depending on the amount of modifications.

One way to go around this problem is to take cold backups. As they involve stopping MySQL before taking a snapshot, you can be sure that data is consistent and it’s all just a matter of starting MySQL to get a new server up. No recovery is needed because data came from a server which did a clean shutdown. Of course, stopping MySQL servers is not an ideal way to handle backups but sometimes it is feasible. For example, maybe you have a slave dedicated to ad-hoc queries, executed manually, which does not have to be up all the time? You could use such a server also as a backup host, shutting down MySQL from time to time in order to take a clean snapshot of its data.

As we discussed above, getting a consistent snapshot may be tricky at times. On the pro side, snapshots are a great way of provisioning new instances. This is true especially in the cloud, where you can easily create a new node using a few clicks or API calls. That it’s all true as long as you use a single volume for your data directory. Until recently, to get a decent I/O performance in EC2, the only option was to use multiple EBS volumes and setup a RAID0 over them. It was caused by a limit of how many pIOPS a single EBS instance may have. This limit has increased significantly (to 20k pIOPS), but even now there are still reasons to use RAIDed approach. In such a setup, you can’t just take snapshots and hope for the best - such snapshots will be inconsistent on RAID level, not to mention MySQL level. Cold backup will still work, as MySQL is down and no disk activity should happen (as long as MySQL data directory is located on a separate device). For more “hot” approaches, you may want to look at ec2-consistent-snapshot - a tool that gives you some options how to perform a consistent snapshot of a RAIDed volume with several EBSes under the hood. It can help you to automate some MySQL tasks like stopping a slave and running FLUSH TABLES WITH READ LOCK. It can also freeze the filesystem on operating system level. ec2-consistent-snapshot is tricky to setup and needs detailed tests, but it is one of the options to pick from.

Good practices and guidelines

We covered some ways in which you can take a backup of the MySQL database. It is time to put it together and discuss how you could setup an efficient backup process.

The main problem is that all of the backup methods have their pros and cons. They also have their requirements when it comes to how they affect regular workloads. As usual, how you’d like to make backups depends on the business requirements, environment and resources. We’d still like to share some guidelines with you.

First of all, you want to have an ability to perform point-in-time recovery. It means that you have to copy binary logs along your backup. It can be either copy from disk to disk or EBS snapshot of a volume where binlogs are located - you have to have them available.

Second - you probably want to have an ability to restore single rows. Now, everything depends on your environment. One way would be to take a logical backup of your system but it may be hard to execute on a large data set. On the other hand, if you can restore a database from the physical backup (for example, click to create a new EBS volume out of the snapshot, click to create a new EC2 instance, click to attach EBS to it), you could be just fine with this process and you won’t have to worry about the logical backup at all.

For larger databases you will be forced to use one of the physical backup methods because of the time needed to perform logical one. Next question - how often do you want to perform a backup? You have binary logs so, theoretically speaking, it should be just fine to get a backup once per day and restore the rest of the data from binlogs. In real world, though, replaying binlogs is a slow and painful process. Of course, your mileage may wary - it all depends on the amount of modifications to the database. So, you need to test it - how quickly you can process and replay binary logs in your environment? How it looks like compared to your business requirements which determines maximum allowed downtime? If you use snapshots - how long the recovery process takes? Or, if you use a cold backup approach, how often can you stop your MySQL and take a snapshot? Even on a dedicated instance, you can’t really do it more often than once per 15 - 30 minutes, workload and traffic permitting. Remember, cold backup = replication lag, no matter if you use regular replication or Galera Cluster (in Galera it’s just called differently - node is in Desync state and applying missing writesets after IST). The backup node has to be able to catch up between backups.

Xtrabackup is a great tool for taking backups - using its incremental backup feature, you can easily take deltas every five minutes or so. On the other hand, restoring those increments may take long time and is error-prone - there is a bunch of not yet discovered bugs in both xtrabackup and InnoDB which sometimes corrupts backups and render them useless. If one of the incremental backups is corrupted, the rest will not be usable. This leads us to another important point - how good is the backup data?

You have to test your backups. We mentioned it in a previous post - as a part of the healthcheck you should be checking if the backup, whichever method you choose to use, looks sane. Looking at file sizes is not enough though. From time to time, for example on monthly basis (but again, it depends on your business requirements), you should perform a full restore test - get a test server, install MySQL, restore data from the backup, test that you can join the Galera cluster or slave it off the master. Having backups is not enough - you need to ensure you have working backups.

We hope this introduction to MySQL backup methods will help you find your own solution in safeguarding your data. The main thing to keep in mind is that you should not be afraid of testing - if you don’t know whether your backup process design makes sense, do test it. As long as you have a working backup process that fulfills your organization’s requirements, there is no bad way of designing it. Just remember to test the restore from time to time and ensure you still can restore the database in a timely manner - databases change, their content too. Usually it grows. What was acceptable one year ago may not be acceptable today - you also need to take that under consideration.

Related Resources

Using s9sbackup to manage your backups - http://www.severalnines.com/blog/simple-backup-management-galera-cluster-using-s9sbackup
mysqldump or XtraBackup? Backup Strategies for MySQL - http://www.severalnines.com/blog/mysqldump-or-percona-xtrabackup-backup-strategies-mysql-galera-cluster
Full restore of MySQL Galera Cluster from Backup - http://www.severalnines.com/blog/full-restore-mysql-galera-cluster-backup

Blog category:

DB Ops

Tags:

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

JSON and the MySQL Argonauts

June 8, 2015, 12:01 pm

≫ Next: Improved User Parsing From The MySQL Protocol

≪ Previous: Become a MySQL DBA blog series - Backup and Restore

The MySQL 5.7.7 JSON lab release has been getting a lot of attention. At a recent conference, I was cornered by a developer who wanted to jump in with both feet by running this release on his laptop on the flight home. Jason and the Argonaughts Movie Poster However the developer was not sure how to begin.

1. Down load the MySQL JSON release from http://labs.mysql.com/. You will get the choice of a Linux binary or source code. Please grab the binary if you are using Linux and un-gzip/tar the download.

2. Shut down the current running version of MySQL. I was lucky in this case that the developer was using a recent copy of Ubuntu.

3. Change directory to the ~/Downloads/mysql-5.7.7-labs-json-linux-el6-x86_64 directory.

4. sudo ./bin/mysqld_safe –user=mysql&

5. ./bin/mysql -u root -p, then provde the password.

6. Enter a \s to get the status. This will confirm that you are using the JSON labs release.
Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.7-labs-json MySQL Community Server (GPL)

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

If you are doing more than simple tests, run mysql_upgrade to update the system tables. But you can skip this step for a quick and dirty exploration but do not expect your JSON data to be around when you go back to the previous version of MySQL.

7. Now you can start testing the JSON data type. I recommend starting with reading JSON Labs Release: JSON Functions, Part 1 — Manipulation JSON Data, JSON Labs Release: JSON Functions, Part 2 — Querying JSON Data, and JSON Labs Release: Native JSON Data Type and Binary Format. Then follow up with JSON Labs Release: Effective Functional Indexes in InnoDB to understand how to create indexes.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Improved User Parsing From The MySQL Protocol

June 8, 2015, 8:52 pm

≫ Next: Auditing MySQL with McAfee and MongoDB

≪ Previous: JSON and the MySQL Argonauts

This isn’t really a feature we should brag about, because it’s a bug that took us a while to figure out, but we believe in sharing the bad as well as the good. There’s a lot to learn from TCP reassembly and protocol reverse engineering!

We received a request from a customer to help track down the user that was sending some queries to their database. Normally we can find this information easily: the user is one of the properties of query samples, and we can just click on a sample and see it. But for this particular customer, the user was always unknown_user.

This means we weren’t able to figure out what database user was issuing the query. Normally there are two ways we can figure out what the user is:

We see the connection handshake and capture the username from there.
We see a COM_CHANGE_USER packet and capture it from that (rare).

Our theory was that this customer’s database connections were all very long-lived, and we never got to see the connection setup sequence. But this didn’t hold up under deeper investigation. We never captured the username for this customer. Argh!

Maybe this customer was using an authentication method we didn’t support? Possible. Some of the newer auth methods in the latest version of MySQL hadn’t been implemented in our sniffer yet. We implemented them. Still nothing!

Much debugging and tcpdumping later, we found out the problem was an undocumented protocol feature, combined with odd client behavior, that caused us to ignore the username during the connection handshake.

The good news is, for this customer and some others, we immediately saw a difference. In the screenshot below (lightly redacted for privacy) you can see how the number of unknown_user queries goes way down. This happens after an agent upgrade. Meanwhile, the number of queries attributed to known users rises in a nice wedge shape as new connections are established and the sniffer keeps track of their queries.

Unknown User

This is not the first undocumented protocol feature we’ve found. (If you’re in the business of reverse engineering wire protocols, you’d better accept incomplete documentation as a given). We assume it won’t be the last.

If you’re interested in reverse engineering the MySQL protocol for fun and profit, you should check out our free tool for MySQL network sniffing. It’s a wrapper around our protocol decoding libraries, and is a great way to capture query traffic off the wire and feed it into analysis tools.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Auditing MySQL with McAfee and MongoDB

June 9, 2015, 6:00 am

≫ Next: Improving the Performance of MySQL on Windows

≪ Previous: Improved User Parsing From The MySQL Protocol

Greetings everyone! Let’s discuss a 3rd Party auditing solution to MySQL and how we can leverage MongoDB to make sense out of all of that data.

The McAfee MySQL Audit plugin does a great job of capturing, at low level, activities within a MySQL server. It does this through some non-standard APIs which is why installing and configuring the plugin can be a bit difficult. The audit information is stored in JSON format, in a text file, by default.

There is 1 JSON object for each action that takes place within MySQL. If a user logs in, there’s an object. If that user queries a table, there’s an object. Imagine 1000 active connections from an application, each doing 2 queries per second. That’s 2000 JSON objects per second being written to the audit log. After 24 hours, that would be almost 173,000,000 audit entries!

How does one make sense of that many JSON objects? One option would be to write your own parser in $YOUR_FAVORITE_LANGUAGE and convert the JSON to INSERT statements and write the data back to MySQL (Note: If you do this, you can whitelist this table within the plugin so that these INSERTs are not re-audit logged). Or, we can use a system designed to import, store and query JSON objects, such as MongoDB.

Install McAfee Audit Plugin

First we need to download the source code for the plugin and download the source code for the specific MySQL version you are running. This is not a complete step-by-step HOWTO on installing this plugin; just some high-level points.

My client for this exercise is still on Percona Server 5.1.73, so we need the source for that EXACT version from percona.com.

We can clone the mcafee/mysql-audit using git.

Unzip the MySQL source and compile it; just don’t do “make install”, only “./configure” and “make” are necessary.

Now compile the plugin. You may want to read the detailed instructions.

This next step is tricky and really only necessary if you are not using vanilla MySQL. It is a required step to allow the plugin to use those non-standard API’s I mentioned earlier. You need to extract the offsets for the plugin to work. Follow the instructions given.

Once that is all done, you can:

INSTALL PLUGIN AUDIT SONAME 'libaudit_plugin.so';

If the plugin fails to load, check MySQL’s error log for the reason why and confer with the plugin documentation on how to resolve.

We now need to enable audit logging because nothing is enabled by default.

SET GLOBAL audit_record_cmds = "select,insert,update,delete";
SET GLOBAL audit_json_file = ON;
SET GLOBAL audit_record_objs = "*.*,{}";
SET GLOBAL audit_force_record_logins = ON;

Look inside @@datadir and you should see a file called mysql-audit.json. You can tail -f this file if you’d like to watch it to make sure data is being written.

If you’d like some more background reading on the audit plugin, check out Fernando’s post on Experiences with McAfee Audit Plugin.

Setting Up MongoDB

Let me begin by stating this is my first time really dealing with MongoDB in any real sense. I spun up an EC2 instance in AWS (m3.large, CentOS 6) and installed MongoDB using yum and the Mongo repositories.

As the ephemeral storage for my instance had been mounted at /opt, I changed just this one option in the supplied /etc/mongod.conf and restarted mongo (service mongod restart).

dbpath=/opt/mongo

I then copied the mysql-audit.json from the MySQL host using SSH:

[percona@mysql-host ~]$ scp -i .ssh/amazon.pem /data/mysql/mysql-audit.json root@54.177.22.22:/tmp/

Then I imported this JSON file directly into MongoDB:

[root@ip-10-255-8-15 ~]# mongoimport --db test --collection audit --drop --file /tmp/mysql-audit.json

The above mongoimport command specifies the database in which to import (test) and in which collection (audit). I also specify to –drop the database before importing. This drop is necessary because the Audit Plugin appends to JSON file and if we repeated these import steps without the –drop, we would be duplicating data.

If there is enough interest, via the comments below, I will investigate the potential of using the socket functionality of the Audit Plugin to have the events stream directly into mongo.

For now though, it’s a wash-rinse-repeat cycle; though there is the ability to rotate the JSON audit log after a certain amount of time and import each file on a daily basis.

Making Data Make Sense

Here is a sample “document” (ie: audit event) that is created by the Audit Plugin.

{
	"_id" : ObjectId("5571ea51b1e714b8d6d804c8"),
	"msg-type" : "activity",
	"date" : "1433438419388",
	"thread-id" : "10214180",
	"query-id" : "295711011",
	"user" : "activebatchSVC",
	"priv_user" : "activebatchSVC",
	"host" : "ecn.corp",
	"ip" : "10.2.8.9",
	"cmd" : "select",
	"objects" : [
		{
			"db" : "",
			"name" : "*",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "markets_source_tfutvol_eab",
			"obj_type" : "VIEW"
		},
		{
			"db" : "historical",
			"name" : "futureopt",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "securities_futures_optdef",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "markets_source_tfutvol_eab",
			"obj_type" : "VIEW"
		},
		{
			"db" : "historical",
			"name" : "futureopt",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "securities_futures_optdef",
			"obj_type" : "TABLE"
		}
	],
	"query" : "SELECT far, bar, baz FROM mytable"
}

!! MongoDB BUG !!

Notice that last field in the document is named “query.” When I attempted some basic aggregate() functions on this field, I received errors on bad syntax. After much frustration, lots Googling and repeated testing, I came to the only conclusion that “query” is a reserved word in MongoDB. There is little to no documentation on this, aside from an almost 3 year old bug report that simply helped confirm my suspicion.

To work around the above bug issue, let’s rename all of the “query” fields to “qry”:

db.audit.update({}, { $rename: { "query": "qry"} }, false, true);

Now we can begin.

Basic Command Counters

Using any of the “top level” fields in each document, we can run reports (called aggregates in Mongo). So an easy one is to get a list of all unique “commands” and how many times they occurred.

> db.audit.aggregate([ { $group: { "_id": "$cmd", "count": { $sum: 1 } } } ]);
{ "_id" : "Failed Login", "count" : 2 }
{ "_id" : "select", "count" : 458366 }
{ "_id" : "Connect", "count" : 455090 }
{ "_id" : "insert", "count" : 2 }
{ "_id" : "Quit", "count" : 445025 }
{ "_id" : null, "count" : 1 }

Breaking down the command above, we are grouping all values in the “cmd” field and counting them up. The SQL equivalent would be:

SELECT cmd, count(cmd) FROM audit GROUP BY cmd;

User Counts

Let’s get a list and count of all user activities. This will include any of the commands listed in the previous aggregate.

> db.audit.aggregate([ { $group: { "_id": "$user", "count": { $sum: 1 } } } ]);
{ "_id" : "baw", "count" : 1883 }
{ "_id" : "eq_shrd", "count" : 1 }
{ "_id" : "reski", "count" : 3452 }
{ "_id" : "alin", "count" : 1 }
{ "_id" : "oey", "count" : 62 }
{ "_id" : "dule", "count" : 380062 }
{ "_id" : "ashi", "count" : 802 }
{ "_id" : "tech_shrd", "count" : 392464 }

A couple interesting things come out here. Firstly, the tech_shrd user does the most ‘activities’ over all other users. Is this expected? Is this normal? Your environment will determine that.

Specific User Activities

Let’s pick a specific user and get their activity counts to make sure they aren’t doing anything weird.

> db.audit.aggregate([
... { $match: { "user": "tech_shrd" } },
... { $group: { "_id": "$cmd", "count": { $sum: 1 } } }
... ]);
{ "_id" : "select", "count" : 132970 }
{ "_id" : "Connect", "count" : 133919 }
{ "_id" : "Quit", "count" : 125575 }

The SQL equivalent:

SELECT cmd, count(cmd) FROM audit WHERE user = 'tech_shrd';

Activities By User

We saw above that there were 2 insert commands. Who ran those?

> db.audit.aggregate([
... { $match: { "cmd": "insert" } },
... { $group: { "_id": "$user", "count": { $sum: 1 } } }
... ]);
{ "_id" : "graz", "count" : 2 }

More simply, we could have just done this to see the entire document/record which would include the SQL that the user executed, timestamp, hostname, etc.

> db.audit.find({ "cmd": "insert" });

The SQL equivalents:

SELECT user, count(user) FROM audit WHERE cmd = 'insert';
SELECT * FROM audit WHERE cmd = 'insert';

Table Activity

The most complex example I could come up with was trying to find out how many times each table was referenced. In theory, with weeks or even months of audit data, we could decide which tables aren’t needed any longer by the application.

> db.audit.aggregate(
... { $unwind: "$objects" },
... { $group: { _id : "$objects.name", count: { $sum: 1 } } },
... { $sort: { "count": -1 } }
... );
{ "_id" : "*", "count" : 17359 }
{ "_id" : "swaps", "count" : 4392 }
{ "_id" : "futureopt", "count" : 3666 }
...(more)

You’ll notice in the sample document above that “objects” is an array of objects with 1 element for each table/view referenced in the ‘qry’ field. We need to “unwind” this array into single elements before we can count them. If someone knows a better way, please let me know. The Audit Plugin uses “*” to represent a derived table from a sub-SELECT, which has no proper name. We can remove all of these using:

> db.audit.update({ }, { $pull: { "objects": { "name": "*" } } }, false, true);

Audit Plugin Caveat: The ‘objects’ array is not a distinct list of the tables involved. For example, a SELECT statement that self-joins twice would produce 3 identical elements in the ‘objects’ array for that audit record. This may skew results. If anyone knows a cool Mongo trick to remove duplicates, please share in the comments.

Conclusion

For a quick wrap-up, we installed the McAfee Audit Plugin, exported some audit data, set up a MongoDB instance in AWS and imported the audit data. As you can see, the possibilities are plentiful on what kind of information you can gather. Feel free to comment on an aggregation you’d like to see if we were running this type of audit on your system.

Cheers,
Matthew

The post Auditing MySQL with McAfee and MongoDB appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Improving the Performance of MySQL on Windows

June 9, 2015, 7:07 am

≫ Next: Indexing MySQL JSON Data

≪ Previous: Auditing MySQL with McAfee and MongoDB

In this blog entry I’d like to describe how you might be able to improve how MySQL performs on Windows by ensuring that you take advantage of a Windows specific configuration setting.

On Unix systems, MySQL programs treat the localhost host name specially. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file, which has some performance advantages over a TCP/IP connection. Windows does not support Unix sockets, however, and hence does not benefit from this optimisation.

However, the use of shared memory connections on Windows can offer significant performance improvements over the use of TCP/IP connections. Shared memory connections are obviously only useful when both the MySQL client and server processes are executing on the same machine, but when they are the performance benefits of using shared memory connections can be helpful. To enable shared memory connections, you would use the shared_memory system variable.

The following screenshot compares the sysbench OLTP read write test performance when using shared memory connections versus when using TCP connections with MySQL 5.6.24 on Windows Server 2012 R2. The graph shows the average number of transactions per second (TPS) measured by sysbench at a variety of thread counts. The TPS value averaged over five runs is plotted at each thread count, with each run taking 300 seconds.

Graph comparing Shared Memory and TCP connection performance using MySQL 5.6.24 — Comparing shared memory and TCP connection performance using MySQL 5.6.24

The uppermost red plot in the graph above shows the results obtained when using shared memory connections between sysbench and MySQL. The lower black plot shows the results obtained when using TCP connections.

Note that as well as giving improved performance when using shared memory connections, the shared memory connections results show reduced variability (apart from the results at 512 threads).

Obviously, changing to a more efficient communication path between the MySQL server and client will only make a dramatic difference like that shown above when the MySQL server isn’t spending a lot of time processing queries and doing file I/O. So if you use MySQL on Windows in a standalone configuration (with the application clients and the MySQL server on the same machine) and switch from using TCP connections to shared memory connections, then the performance improvement that you experience may not be as great.

The test machine used to produce the results shown above is a Sun Fire X4170 M2 Server with 24 logical CPUs running at 2930MHz.

The following graph shows the results of running the same TCP vs shared memory connection tests using the MySQL 5.7.7 release candidate:

Graph showing shared Memory vs TCP connection performance using MySQL 5.7.7rc — Comparing shared memory and TCP connection performance using MySQL 5.7.7rc

Note that the MySQL 5.7.7 release candidate shows better performance at high thread counts than MySQL 5.6.24 in the graphs above. Profiling the shared memory test runs at 1024 threads for the different versions of MySQL reveals that the second “hottest” function in MySQL 5.6.24 at this thread count is RtlpEnterCriticalSectionContended, whereas this function has dropped to fourth place in the MySQL 5.7.7 profile. A call to RtlpEnterCriticalSectionContended on Windows indicates contention for a critical section (a multithreading lock).

The reduction in lock contention with MySQL 5.7.7 fits nicely with the performance and scalability improvements in MySQL 5.7 mentioned in Geir’s blog entry: What’s new in MySQL 5.7 (First Release Candidate).

The tests used to produce the results shown above are part of the regular performance tests that run daily in the Oracle test labs against the latest builds of MySQL on various platforms. The results of these daily performance tests are automatically checked for performance regressions and failures and engineers are alerted if any are found.

One of the characteristics of a good performance test is repeatability. In order to reduce the variability of the TPS results reported by sysbench on the machine used for the test results, the sysbench process was affinitized to use CPUs 0,1,2,3,12,13,14,15 and the MySQLD process was affinitized to use CPUs 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23. These CPU affinities were arrived at by experimenting to find the affinity settings that provided the highest and most stable TPS results.

Microsoft has also introduced an enhancement to the “loopback” TCP/IP performance on Windows 8 and Windows 2012 via the “TCP Loopback Fast Path” (see http://blogs.technet.com/b/wincat/archive/2012/12/05/fast-tcp-loopback-performance-and-low-latency-with-windows-server-2012-tcp-loopback-fast-path.aspx ). The graph below shows the results I obtained when I experimented with a patch on the communications library used by MySQL 5.7.6 m16 and sysbench that set the SIO_LOOPBACK_FAST_PATH option to 1.

Graph showing Fast Loopback TCP vs standard TCP connection performance using MySQL 5.7.6-m16 — Comparing Fast Loopback TCP and standard TCP connection performance using MySQL 5.7.6-m16

The lower black plot shows the results from the “normal” unpatched TCP connections. The upper red plot shows the results when using the communications library patched with the “Loopback Fast Path” setting.

Note that the performance gain from using the “TCP Loopback Fast Path” patch over normal TCP connections is not as significant as that available from switching to shared memory connections. The data in the graph above is also not directly comparable with the previous graphs showing the shared memory connection performance as the CPU affinity settings differ.

In my next post, I hope to show how some previously undocumented settings can further improve MySQL’s performance on Windows. So please stay tuned!

As always, thank you for using MySQL!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Indexing MySQL JSON Data

June 9, 2015, 10:30 am

≫ Next: Replicate MySQL to Amazon Redshift with Tungsten: The good, the bad & the ugly

≪ Previous: Improving the Performance of MySQL on Windows

“MySQL’s JSON data type is great! But how do you index the JSON data?” I was recently presenting at the CakePHP Cakefest Conference and was asked that very question. And I had to admit I had not been able to play, er, experiment with the JSON datatype to that level. Now I have and it is fairly easy.

2. Add in some data
INSERT INTO `colors` VALUES ('red','{\"value\": \"f00\"}'),('green','{\"value\": \"0f0\"}'),('blue','{\"value\": \"00f\"}'),('cyan','{\"value\": \"0ff\"}'),('magenta','{\"value\": \"f0f\"}'),('yellow','{\"value\": \"ff0\"}'),('black','{\"value\": \"000\"}');

3. SELECT some data
Use the jsb_extract function to efficiently search for the row desired.
mysql> select jsn_extract(hue, '$.value') from colors where jsn_extract(hue, '$.value')="f0f"; +-----------------------------+ | jsn_extract(hue, '$.value') | +-----------------------------+ | "f0f" | +-----------------------------+ 1 row in set (0.00 sec)

But how efficient is that? Turns out we end up doing a full table scan.

4 Add a VIRTUAL column to index quickly
mysql> ALTER TABLE colors ADD value_ext char(10) GENERATED ALWAYS AS (jsn_extract(hue, '$.value')) VIRTUAL;
This will add a virtual column from the value data in the hue column.

5 Index the New Column
mysql> CRATE INDEX value_ext_index ON colors(value_ext);

mysql>

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Replicate MySQL to Amazon Redshift with Tungsten: The good, the bad & the ugly

June 9, 2015, 12:47 pm

≫ Next: Hash-based workarounds for MySQL unique constraint limitations

≪ Previous: Indexing MySQL JSON Data

Heterogenous replication involves moving data from one database platform to another. This is a complicated endevour because datatypes, date & time formats, and a whole lot more tend to differ across platforms. In fact it’s so complex many enterprises simply employ a commercial solution to take away the drudgery. Join 31,000 others and follow Sean […]

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Blog category:

Tags:

Introduction

Installing and Configuring Juju

Deploying Galera using Juju

Accessing and Exposing the Juju-started Galera Cluster

Further Reading

Summary

Overview of Change Buffering

Special Change Buffer Fields

Change Buffer Bitmap Page

Information Stored in Change Buffer Bitmap Page

Tracking Free Space of Pages

Updating Free Space Information

Record Count in NUSI Leaf Pages

Merging the Buffered Changes to Target NUSI Page

Conclusion

Why Redis’s TCP Traffic Is Hard To Sniff

How We Do It

Results

You Can Do This Too

Blog category:

Tags:

Install McAfee Audit Plugin

Setting Up MongoDB

Making Data Make Sense

!! MongoDB BUG !!

Basic Command Counters

User Counts

Specific User Activities

Activities By User

Table Activity

Conclusion