AWS Aurora Benchmarking - Blast or Splash?

October 31, 2015, 4:00 pm

≫ Next: MySQL: a few observations on the JSON type

≪ Previous: Slides of HOL3348 on Getting started with MySQL Cluster

Summary

In this investigation, three solutions were analyzed: MySQL with MHA (Master High Availability Manager for MySQL), MySQL with Galera Replication (synchronous data replication cross-node), and AWS RDS-Aurora (data solution from Amazon promising HA and read scalability).

These three platforms were evaluated for high availability (HA; how fast the service would recover in case of a crash) and performance in managing both incoming writes and concurrent read and write traffic.

These were the primary items evaluated, followed by an investigation into how to implement a multi-region HA solution, as requested by the customer. This evaluation will be used to assist the customer in selecting the most suitable HA solution for their applications.

HA tests

time_failover

MySQL + Galera was proven to be the most efficient solution; in the presence of read and write load, it performed a full failover in 15 seconds compared to 50 seconds for AWS-Aurora, and to more than two minutes for MySQL with MHA.

Performance

Tests indicated that MySQL with MHA is the most efficient platform. With this solution, it is possible to manage read/write operations almost twice as fast as and more efficiently than MySQL Galera, which places second. Aurora consistently places last.

Recommendations

In light of the above tests, the recommendations consider different factors to answer the question, "Which is the best tool for the job?" If HA and very low failover time are the major factors, MySQL with Galera is the right choice. If the focus is on performance and the business can afford several minutes of down time, then the choice is MySQL with MHA.

Finally, Aurora is recommended when there is a need for an environment with limited concurrent writes, the need to have significant scale in reads, and the need to scale (in/out) to cover bursts of read requests such as in a high-selling season.

{autotoc enabled=yes}

Why This Investigation?

The outcomes presented in this document are the result of an investigation taken in September 2015. The research and tests were conducted as an answer to a growing number of requests for clarification and recommendations around Aurora and EC2. The main objective was to be able to focus on real numbers and scenarios seen in production environments.

Everything possible was done to execute the tests in a consistent way across the different platforms.

Errors were prune by errors executing each test on each platform before collecting the data, this run also had the scope to identify the saturation limit that was different for each tested architecture. During the real test execution, I had repeated the test execution several times to identify and reduce any possible deviation.

Things to Answer:

In the investigation I was looking to answer to the following things by platform

HA
- Time to failover
- Service interruption time
- Lag in execution at saturation level
Ingest & Compliance test
- Execution time
- Inserts/sec
- Selects/sec
- Deletes/sec
- Handlers/sec
- Rows inserted/sec
- Rows select/sec (compliance only)
- Rows delete/sec (compliance only)

Brief description about MHA, Galera, Aurora

MHA

MHA is a solution that sits on top of the MySQL nodes, checking the status of each the nodes, and using custom scripts to manage the failover. An important thing to keep in mind is that MHA is not acting as the “man in the middle”, and as such, no connection is sent to the MHA Controller. The MHA controller instead could manage the entry point with a VIP (Virtual IP), as HAProxy settings, or whatever makes sense in the design architecture.

At the MySQL level, the MHA controller will recognize the failing master and will elect the most up-to-date one as the new master. It will also try to re-align the gap between the failed master and the new one using original binary logs if available and accessible.

Scalability is provided via standard MySQL design, having one master in read/write and several servers in read mode. Replication is asynchronous replication, and therefore it could easily lag, leaving the read nodes quite far behind the master.

MySQL with Galera Replication

MySQL + Galera works on the concept of a cluster of nodes sharing the same dataset.

What this means is that MySQL+Galera is a cluster of MySQL instances, normally three or five, that share the same dataset, and where data is synchronously distributed between nodes.

The focus is on the data not on the nodes given to each node sharing the same data and status. Transactions are distributed across all active nodes that represent the primary component. Nodes can leave and re-join the cluster, modifying the conceptual view that expresses the primary component (for more details on Galera review my presentation ).

What is of note here, is that each node has the same data at the end of a transaction; given that the application can connect to any node and read/write data in a consistent way.

As previously mentioned replication is (virtually) synchronous, data is locally validated and certified, and conflicts are managed to keep data internally consistent. Failover is more of an external need than a Galera one, meaning that the application can be set to connect to one node only or on all the available nodes, and if one of the nodes is not available the application should be able to utilize the others.

Given that not all applications have this function out of the box, it is common practice to add another layer with HAProxy to manage the application connection, and have HAProxy distribute the connection across nodes or to use a single node as main point of reference and then shift to the others in case of needs. The shift in this case is limited to move the connection points from Node A to Node B.

MySQL/Galera write scalability is limited by the capacity to manage the total amount of incoming writes by the single node. There is no write scaling in adding nodes. MySQL/Galera is simply a write distribution platform, and reads can be performed consistently on each node.

Aurora

Aurora is based on the RDS approach with the ease of management, built-in backup, data on disk autorecovery etc. Amazon also states that Aurora offers a better replication mechanism (~100 ms lag).

This architecture is based on a main node acting as a read/write node and a variable number of nodes that can work as read-only nodes. Given that the replication is claimed to be within ~100ms, reading from the replica nodes should be safe and effective. A read replica can be distributed by AZ (availability zone), but must reside in the same region.

Applications connect to an entry point that will not change. In case of failover, the internal mechanism will move the entry point from the failed node to the new master, and also all the read replicas will be aligned to the new master.

Data is replicated at a low level, and pushed directly from the read/write data store to a read data store, and here there should be a limited delay and very limited lag (~100ms). Aurora replication is not synchronous, and given only one node is active at time, there is no data consistency validation or check.

In terms of scalability, Aurora is not write scaling by adding of the replica nodes. The only way to scale the writes is to scale up, meaning upgrading the master instance to a more powerful one. Given the nature of the cluster, it would be unwise to upgrade only the master.

Reads are scaled by adding new read replicas.

What about multi-region?

One of the increasing requests I receive is to design architectures that could manage failover across regions.

There are several issues in replicating data across regions, starting from data security down to packet size and frame dimension modification, given we must use existing networks for that.

For this investigation, it is important to note one thing: the only solution that could offer internal replication cross-region, is Galera with the use of Segments. But given the possible issues I normally do not recommend using this solution when we talk of regions across continents. Galera is also the only one that eventually would help optimize asynchronous replication, using multiple nodes to replicate on another cluster.

Aurora and standard MySQL must rely on basic asynchronous replication, with all the related limitations.

Architecture for the investigation

The investigation I conducted used several components:

EIP = 1
VPC = 1
ELB=1
Subnets = 4 (1 public, 3 private)
HAProxy = 6
MHA Monitor (micro ec2) = 1
NAT Instance (EC2) =1 (hosting EIP)
DB Instances (EC2) = 6 + (2 stand by) (m4.xlarge)
Application Instances (EC2) = 6
EBS SSD 3000 PIOS
Aurora RDS node = 3 (db.r3.xlarge)
Snapshots = (at regular intervals)

Application nodes connect to the databases either using an Aurora entry point, or HAProxy. For MHA, the controller action was modified to act on an HAProxy configuration instead on a VIP (Virtual IP), modifying his active node and reloading the configuration. For Galera, the shift was driven by the recognition of the node status using the HAProxy check functionality.
As indicated in the above schema, replication was distributed across several AZ. No traffic was allowed to reach the databases from outside the VPC. ELB was connected to the HAProxy instances, which have proven to be a good way to rotate across several HAProxy instances, in case an additional HA layer is needed.
For the scope of the investigation, given each application was hosting a local HAProxy, using the “by tile” approach ELB was tested in the phase dedicated to identify errors and saturation, but not used in the following performance and HA tests. The NAT instance was configured to allow access to each HAProxy web interface only, to review statistics and node status.

Tests

Description:

I performed 3 different types of tests:

High availability
Data ingest
Data compliance

High Availability

Test description:

The tests were quite simple; I was running a script that, while inserting, was collecting the time of the command execution, storing the SQL execution time (with NOW()), returning the value to be printed, and finally collecting the error code from the MySQL command.

The result was:

Output

2015-09-30 21:12:49 2015-09-30 21:12:49 0 /\ /\ /\ | | | Date time sys now() MySQL exit error code (bash)

Log from bash : 2015-09-30 21:12:49 2015-09-30 21:12:49 0

Inside the table:

CREATE TABLE `ha` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `c` datetime NOT NULL;
  PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=178 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ciselect a,d from ha;
+-----+---------------------+
| a   | d                   |
+-----+---------------------+
|   1 | 2015-09-30 21:12:31 |

For Galera, the HAProxy settings were:

server node1 10.0.2.40:3306 check port 3311 inter 3000 rise 1 fall 2  weight 50
server node2Bk 10.0.1.41:3306 check port 3311 inter 3000 rise 1 fall  2   weight 50 backup
server node3Bk 10.0.1.42:3306 check port 3311 inter 3000 rise 1 fall  2   weight 10 backup

I ran the test script on the different platforms, without heavy load, and then close to the MySQL/Aurora saturation point.

High Availability Results

I think an image is worth millions of words.

MySQL with MHA was taking around 2 minutes to perform the full failover, meaning from interruption to when the new node was able to receive data again.

Under stress, the master was so ahead of the slaves, and replication lag was so significant, that a failover with binlog application was just taking too long to be considered a valid option in comparison to the other two. This result was not a surprise, but had pushed me to analyze the MHA solution more independently given the behaviour was totally diverging from the other two such that it was not comparable.

More interesting was the behavior between MySQL/Galera and Aurora. In this case, MySQL/Galera was consistently more efficient than Aurora, with or without load. It is worth mentioning that of the 8 seconds taken by MySQL/Galera to perform the failover, 6 were due to the HAProxy settings which had 3000 ms interval and 2 loops in the settings before executing failover. Aurora was able to perform a decent failover when the load was low, while under increasing load, the read nodes become less aligned with the write node, and as such less ready for failover.

Note that I was performing the tests following the Amazon indication to use the following to simulate a real crash:

ALTER SYSTEM CRASH [INSTANCE|NODE]

As such, I was not doing anything strange or out of the ordinary.

Also, while doing the tests, I had the opportunity to observe the Aurora replica lag using CloudWatch, which reported a much higher lag value than the claimed ~100 ms.

As you can see below:

In this case, I was getting almost 18 seconds of lag in the Aurora replication, much higher than ~100 ms!

Or a not clear

As you can calculate by yourself, 2E16 is several decades of latency.

Another interesting metric I was collecting was the latency between the application sending the request and the moment of execution.

Once more, MySQL/Galera is able to manage the requests more efficiently. With a high load and almost at saturation level, MySQL/Galera was taking 61 seconds to serve the request, while Aurora was taking 204 seconds for the same operation.

This indicates how high the impact of load can be in case of saturation, for the response time and execution. This is a very important metric to keep under observation to decide when or if to scale up.

Exclude What is Not a Full Fit

As previously mentioned, this investigation was intended to answer several questions first of all about the HA and the failover time. Given that, I had to exclude the MySQL/MHA solution from the remaining analysis, because it is so drastically divergent that it will make no sense to compare, also analyzing the performance of the MHA MySQL solution in conjunction of the others, would had flattered the other two. Details about MHA/MySQL are present in the Annex.

Performance tests

Ingest tests

Description:

This set of tests were done to cover how the two platforms behaved in case of a significant amount of inserts.

I used IIbench with a single table and my own StressTool that instead uses several tables (configurable) plus other more configurable options like:

Configurable batch inserts
Configurable insert rate
Different access method to PK (simple PK or composite)
Multiple tables and configurable table structure.

The two benchmarking tools differ also in the table definition:

IIBench

CREATE TABLE `purchases_index` (
  `transactionid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  `data` varchar(4000) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

StressTool

CREATE TABLE `tbtest1` (
  `autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) NOT NULL,
  `uuid` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `b` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `c` char(200) COLLATE utf8_unicode_ci NOT NULL,
  `counter` bigint(20) DEFAULT NULL,
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `partitionid` int(11) NOT NULL DEFAULT '0',
  `date` date NOT NULL,
  `strrecordtype` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`autoInc`,`date`),
  KEY `IDX_a` (`a`),
  KEY `IDX_date` (`date`),
  KEY `IDX_uuid` (`uuid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 CHARSET=utf8 COLLATE=utf8_unicode_ci

IIbench was executed using 32 threads for each application server, while the stress tool was executed running 16/32/64 on each Application node, resulting in 96, 192, and 384 threads, each of which executing the batch insert with 50 insert per batch (19,200 rows).

Ingest Test Results

IIBench

Execution time:

iibenchexectime

Time to insert ~305 ML rows, in one single table using 192 threads.

Rows Inserted/Sec

Insert Time

The result of this test is once more quite clear, with MySQL/Galera able to manage the load in 1/3 of the time Aurora takes. MySQL/Galera was also more consistent in insert time taken with the growth of the rows number inside the table.

It is also of interest to note how the number of rows inserted reduced faster in MySQL/Galera related to Aurora.

Java Stress Tool

Execution Time

This test was focused on multiple inserts (as IIbench) but using an increasing number of threads, and multiple tables. This test is closer to what could happen in real life given parallelization, and multiple entry points are definitely more common than a single table insert in a relational database.

In this test Aurora performs better and the distance is less evident than the IIbench test.

In this test, we can see that Aurora is able to perform almost as a standard MySQL instance, but this performance does not persist with the increase of concurrent threads. Actually, MySQL/Galera was able to manage the load of 384 threads in 1/3 of the time in respect to Aurora.

Analyzing more in-depth, we can see that MySQL/Galera is able to manage more efficiently the commit phase part, which is surprising keeping in mind MySQL/Galera is using synchronous replication, and it had to manage the data validation and replication.

Row inserted

Com Commit

Commit Handler Calls

In conclusion, I can say that also in this case MySQL/Galera performed better than Aurora.

Compliance Tests

The compliance tests I ran were using Tpcc-mysql with 200 warehouses, and StressTool with 15 parent tables and 15 child tables generating Select/Insert/Delete on a basic dataset of 100K entries.

All tests were done with the buffer pool saturated.

Tests for tpcc-mysql were using 32, 64, and 128 threads, while for StressTool I was using 16, 32, and 64 threads (multiply for the 3 application machines).

Compliance Tests Results

Java Stress Tool

Execution Time

In this test, we have the applications performing concurrent access and action in read/write rows on several tables. It is quite clear from the picture above that MySQL/Galera was able to process the load more efficiently than Aurora. Both platforms had a significant increase in the execution time with the increase of the concurrent threads.

Both platforms reach saturation level with this test, using a total of 192 concurrent threads. Saturation was at different moment and hitting a different resource during the 192 concurrent threads test; in the case of Aurora it was CPU-bound and there was replication lag; for MySQL/Galera the bottlenecks were i/o and flow control.

Rows Insert and Read

In relation to the rows managed, MySQL/Galera performed better in terms of quantity of rows managed, but this trend went significantly down, while increasing the concurrency. Both read and write operations were affected, while Aurora was managing less in terms of volume, but became more consistent while concurrency increased.

Com Select Insert Delete

Analyzing the details by type of command, it is possible to identify that MySQL/Galera was more affected in the read operations, while writes showed less variation.

Handlers Calls

In write operation Aurora was inserting a significantly less volume of rows, but it was more consistent with concurrency increase. This is probably because the load exceed the capacity of the platform, and Aurora was acting at its limit. MySQL/Galera was instead able to manage the load with 2 times the performance of Aurora, also if the increasing concurrency was affecting negatively the trend.

TPCC-mysql

Transactions

The Tpcc-mysql is emulating the CRUD activities of transaction users against N warehouses. In this test, I used 200 warehouses; each warehouse has 10 terminals, with 32, 64, and 128 threads.

Once more, MySQL/Galera is consistently better than Aurora in terms of volume of transactions. Also the average per-second results were consistently over almost 2.6 times better than Aurora. On the other hand, Aurora shows less fluctuation in serving the transactions, having a more consistent trend for each execution.

Average Transactions

Conclusions

High Availability

MHA excluded, the other two platforms were shown to be able to manage the failover operation in a limited time frame (below 1 minute); nevertheless MySQL/Galera was shown to be more efficient and consistent, especially in consideration of the unexpected and sometime not fully justified episodes of Aurora replication lags. This result is a direct consequence of the synchronous replication, that by design brings MySQL/Galera in to not allow an active node to fell behind.

In my opinion the replication method used in Aurora, is efficient, but it still allows node misalignments, which is obviously not optimal when there is the need to promote a read only node to become a read/write instance.

Performance

MySQL/Galera was able to outperform Aurora in all tests -- by execution time, number of transactions, and volumes of rows managed. Also, scaling up the Aurora instance did not have the impact I was expecting. Actually it was still not able to match the EC2 MySQL/Galera performance, with less memory and CPUs.

Note that while I had to exclude the MHA solution because of the failover time, the performance achieved using standard MySQL were by far better than MySQL/Galera or Aurora, please see Appendix 1.

General Comment on Aurora

The Aurora failover mechanism is not so much different as other solutions. In case of a crash of a node another node will be elected as new primary node, on the basis of the "most up-to-date" rule.

Replication is not illustrated clearly in the documentation, but it seems to be a combination of block device distribution and semi-sync replication, meaning that a primary is not really affected by the possible delay in the copy of a block once its dispatched. What is also interesting is the way the data of a volume will be fixed in case of issues that will happen copying the data over from another location or volume that hosts the correct data. This resembles the HDFS mechanism, and may well be exactly it; what is relevant is that if HDFS, the latency in the operation may be relevant. What is also relevant in this scenario is the fact that if the primary node crashes, during the election of a secondary to primary, there will be service interruption; this can be up to 2 minutes according to documentation and verified in the tests.

About replication, it is stated in the documentation that the replication can take ~100 ms, but that is not guaranteed and is dependent on the level of traffic in writes incoming to the primary server. I have already reported that this is not true, and replication can take significantly longer.

What happened during the investigation is that, the more writes the more possible distance could exists between the data visible in the primary and the one visible in the replica (replication lag). No matter how efficient the replication mechanism is, this is not synchronous replication, and this does not guarantee consistent data reading by transaction.

Finally, replication across regions is not even in the design of the Aurora solution, and it must rely on standard replication between servers as with asynchronous MySQL replication. Aurora is nothing more, nothing less than an RDS with steroids, and with smarter replication.

Aurora is not performing any kind of scaling in writes, scaling is performed in reads. The way it scales in write is by scaling up the box, so more power and memory = more writes, nothing new and obviously scaling up is also more cost. That said, I think Aurora is a very valuable solution when in need to have a platform that requires extensive read scaling (in/out), and for rolling out a product in phase 1.

Appendix MHA

MHA performance Graphs

As previously mention MySQL/MHA was acting significantly better the other two solutions. IIBench test complete in 281 seconds against the 4039 of Galera.

IIBench Execution Time

MySQL/MHA execution time (Ingest & Compliance)

The execution of the Ingest and compliance test in MySQL/MHA had been 3 time faster than MySQL/Galera.

MySQL/MHA Rows Inserted (Ingest & Compliance)

The number of inserted rows is consistent with the execution time, being 3 times the one allowed in the MySQL/Galera solution. MySQL/MHA was also able to better manage the increasing concurrency with simple inserts or in the case of concurrent read/write operations.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL: a few observations on the JSON type

November 1, 2015, 2:33 pm

≫ Next: Slow Query Log Rotation

≪ Previous: AWS Aurora Benchmarking - Blast or Splash?

MySQL 5.7 comes with built-in JSON support, comprising two major features:

A native JSON data type
A set of built-in functions to manipulate values of the JSON type

Despite being added rather recently (in MySQL 5.7.8 to be precise - one minor version before the 5.7.9 GA version), I feel the JSON support so far looks rather useful. Improvements are certainly possible, but compared to for example XML support (added in 5.1 and 5.5), the JSON feature set added to 5.7.8 is reasonably complete, coherent and standards-compliant.

(We can of course also phrase this more pessimistically and say that XML support falls short on these accounts, but that's not what this post is about :-)

There is potentially a lot to write and explain about the JSON support, and I can't hope to completely cover the subject in one blog post. Rather, I will highlight a few things I observed in the hopes that this will help others get started with JSON in MySQL 5.7.

Creating JSON values

There are a number of ways to create values of the JSON type:

CAST a value of any non-character string type AS JSON to obtain a JSON representation of that value. Example:


mysql> SELECT CAST(1 AS JSON), CAST(1.1 AS JSON), CAST(NOW() AS JSON);
+-----------------+-------------------+------------------------------+
| CAST(1 AS JSON) | CAST(1.1 AS JSON) | CAST(NOW() AS JSON)          |
+-----------------+-------------------+------------------------------+
| 1               | 1.1               | "2015-10-31 23:01:56.000000" |
+-----------------+-------------------+------------------------------+
1 row in set (0.00 sec)

Even though it may not be immediately clear from the result, the CAST operation actually turned these values into JSON equivalents. More about this in the next section.

If the value you're casting is of a character string type, then its value should be parseable as either a JSON object or a JSON array (i.e., JSON documents), as a JSON keyword indicating a built-in value, like null, true, false, or as a properly quoted JSON string value:


mysql> SELECT CAST('{}' AS JSON) object, CAST('[]' AS JSON) array, CAST('null' AS JSON) "null", CAST('true' AS JSON) "true", CAST('false' AS JSON) "false", CAST('"string"' AS JSON) string;
+--------+-------+------+------+-------+----------+
| object | array | null | true | false | string   |
+--------+-------+------+------+-------+----------+
| {}     | []    | null | true | false | "string" |
+--------+-------+------+------+-------+----------+
1 row in set (0.00 sec)

If the string is not parseable as JSON, you'll get a runtime error:


mysql> SELECT CAST('' AS JSON);
ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "The document is empty." at position 0 in ''.
mysql> SELECT CAST('{]' AS JSON);
ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1 in '{]'.

Note that many keywords that might be valid in other environments, like NaN, Infinity, javascript built-in constructor fields like Number.EPSILON, and even undefined are *not* valid in this context. Remember - this is JSON, not javascript.

To get the JSON presentation of a plain, unquoted string value, you can use the JSON_QUOTE() function:


mysql> SELECT JSON_QUOTE(''), JSON_QUOTE('{]');
+----------------+------------------+
| JSON_QUOTE('') | JSON_QUOTE('{]') |
+----------------+------------------+
| ""             | "{]"             |
+----------------+------------------+
1 row in set (0.00 sec)

SELECT a column of the JSON data type. Of course, such a column would first need to be populated before it yields JSON values, and this can be done simply with an INSERT statement. When INSERT-ing non-JSON type values into a column of the JSON type, MySQL will behave as if it first converts these values to JSON-type, just as if it would apply CAST(value AS JSON) to those values.

Call a function that returns a value of the JSON-type, like JSON_QUOTE() which was mentioned above. To create new JSON documents from scratch, JSON_OBJECT() and JSON_ARRAY() are probably most useful:


mysql> SELECT JSON_ARRAY(1, 2, 3) array, JSON_OBJECT('name1', 'value1', 'name2', 'value2') object;
+-----------+----------------------------------------+
| array     | object                                 |
+-----------+----------------------------------------+
| [1, 2, 3] | {"name1": "value1", "name2": "value2"} |
+-----------+----------------------------------------+
1 row in set (0.00 sec)

Note that we could have achieved the previous result also by CASTing literal string representations of these JSON documents AS JSON:


mysql> SELECT CAST('[1, 2, 3]' AS JSON) array, CAST('{"name1": "value1", "name2": "value2"}' AS JSON) object;
+-----------+----------------------------------------+
| array     | object                                 |
+-----------+----------------------------------------+
| [1, 2, 3] | {"name1": "value1", "name2": "value2"} |
+-----------+----------------------------------------+
1 row in set (0.00 sec)

However, as we shall see later on, this approach is not entirely equivalent to constructing these documents through JSON_ARRAY and JSON_OBJECT.

There are many more built-in JSON functions that return a value of the JSON data type. Unlike JSON_QUOTE(), JSON_ARRAY() and JSON_OBJECT(), most of these also require a JSON document as their first argument. In these cases, the return value represents a modified instance of the document passed as argument.

Operating on JSON documents: Extraction and Modification

While the JSON document may be a convenient unit for storing and transporting related items of data, any meaningful processing of such documents will always involve some operation to transform or modify such a document: for example, extracting some item stored inside the document, or adding or removing properties or array elements.

Manipulation of JSON documents always involves at least two distinct items:

The JSON document to operate on. This can be an explicit or implicitly obtained JSON document, constructed in any of the ways described earlier in this post. In general, functions that manipulate JSON documents accept the document that is being operated on as their first argument.
A path. The path is an expression that identifies which part of the document to operate on. In general, the second argument of functions that manipulate JSON documents is a path expression. Depending on which function exactly, other arguments may or may not accept path expressions as well.

It is important to point out that none of the functions that modify JSON documents actually change the argument document inline: JSON functions are pure functions that don't have side effects. The modified document is always returned from the function as a new document.

JSON path expressions in MySQL

While the path is passed as a string value, it's actually an expression consisting of alternating identifiers and access operators that as a whole identifies a particular piece within the JSON document:

Identifiers

There are 4 types of identifiers that can appear in a path:

$ (dollar sign) is a special identifier, which is essentially a placeholder for the current document being operated on. It can only appear at the start of the path expression
Property names are optionally double quoted names that identify properties ("fields") in a JSON object. Double quoted property names are required whenever the property name contains meta characters. For example, if the property name contains any interpunction or space characters, you need to double quote the name. A property name can appear immediately after a dot-access operator.
Array indices are integers that identify array elements in a JSON array. Array indices can appear only within an array-access operator (which is denoted by a pair of square braces)
* (asterisk) is also a special identifier. It indicates a wildcard that represents any property name or array index. So, the asterisk can appear after a dot-operator, in which case it denotes any property name, or it may appear between square braces, in which case it represents all existing indices of the array.

The asterisk essentially "forks" the path and may thus match multiple values in a JSON document. The MySQL JSON functions that grab data or meta data usually have a way to handle multiple matched values, but JSON functions that modify the document usually do not support this.

Access operators

Paths can contain only 2 types of access operators:

dot-operator, denoted by a .-character. The dot-operator can appear in between any partial path expression and an identifier (including the special wildcard identifier *). It has the effect of extracting the value identified by the identifier from the value identified by the path expression that precedes the dot.

This may sound more complicated than it really is: for example, the path $.myproperty has the effect of extracting whatever value is associated with the top-level property called myproperty; the path $.myobject.myproperty has the effect of extracting the value associated with the property called myproperty from the nested object stored in the myobject property of the top-level document.
array access-operator, denoted by a matching pair of square braces: [...]. The braces should contain either an integer, indicating the position of an array element, or the * (wildcard identifier) indicating all array element indices.

The array-access operator can appear after any path expression, and can be followed by either a dot-operator (followed by its associated property identifier), or another array access operator (to access nested array elements).

Currently, the braces can be used only to extract array elements. In javascript, braces can also contain a quoted property name to extract the value of the named property (equivalent to the dot-operator) but this is currently not supported in MySQL path expressions. (I believe this is a - minor - bug, but it's really no biggie since you can and probably should be using the dot-operator for properties anyway.)

Below is the syntax in a sort of EBNF notation in case you prefer that:


  mysql-json-path         ::= Document-placeholder path-expression?
  Document-placeholder    ::= '$'
  path-expression         ::= path-component path-expression*
  path-component          ::= property-accessor | array-accessor
  property-accessor       ::= '.' property-identifier
  property-identifier     ::= Simple-property-name | quoted-property-name | wildcard-identifier
  Simple-property-name    ::= <Please refer to JavaScript, The Definitive Guide, 2.7. Identifiers>
  quoted-property-name    ::= '"' string-content* '"'
  string-content          ::= Non-quote-character | Escaped-quote-character
  Non-quote-character     ::= <Any character except " (double quote)>
  Escaped-quote-character ::= '\"'
  wildcard-identifier     ::= '*'
  array-accessor          ::= '[' element-identifier ']'
  element-identifier      ::= [0-9]+ | wildcard-identifier

Grabbing data from JSON documents

json JSON_EXTRACT(json, path+)
: This functions gets the value at the specified path. Multiple path arguments may be passed, in which case any values matching the paths are returned as a JSON array.
json json-column->path
: If you have a table with a column of the JSON type, then you can use the -&gt operator inside SQL statements as a shorthand for JSON_EXTRACT(). Note that this operator only works inside SQL statements, and only if the left-hand operand is a column name; it does not work for arbitrary expressions of the JSON type. (Pity! I would love this to work for any expression of the JSON type, and in any context - not just SQL statements)

Grabbing metadata from JSON documents

bool JSON_CONTAINS(json, value, path?): Checks whether the specified value appears in the specified document. If the path is specified, the function returns TRUE only if the value appears at the specified path. If the path argument is omitted, the function looks *anywhere* in the document and returns TRUE if it finds the value (either as property value or as array element).
bool JSON_CONTAINS_PATH(json, 'one'|'all', path+): Checks whether the specified JSON document contains one or all of the specified paths. Personally I think there are some issues with this function
int JSON_DEPTH(json): Number of levels present in the document
json-array JSON_KEYS(json-object, path?): Returns the property names of the specified object as a JSON-array. If path is specified, the properties of the object identified by the path are returned instead.
int JSON_LENGTH(json, path?): Returns the number of keys (when the json document is an object) or the number of elements (in case the json document is an array). If a path is specified, the function is applied to the value identified by the path rather than the document itself. Ommitting the path is equivalent to passing $ as path.
string JSON_SEARCH(json, 'one'|'all', pattern, escape?, path*): Searches for string values that match the specified pattern, and returns the path or paths where the properties that match the pattern are located. The second argument indicates when the search should stop - in case it's 'one', search will stop as soon as a matching path was found, and the path is returned. In case of 'all', search will continue until all matching properties are found. If this results in multiple paths, then a JSON array of paths will be returned. The pattern can contain % and _ wildcard characters to match any number of characters or a single character (just as with the standard SQL LIKE-operator). The escape argument can optionally define which character should be used to escape literal % and _ characters. By default this is the backslash (\). Finally, you can optionally limit which parts of the document will be searched by passing one or more json paths. Technically it is possible to pass several paths that include the same locations, but only unique paths will be returned. That is, if multiple paths are found, the array of paths that is returned will never contain the same path more than once.

Unfortunately, MySQL currently does not provide any function that allows you to search for property names. I think it would be very useful so I made a feature request.
string JSON_TYPE(json): Returns the name of the type of the argument value. It's interesting to note that the set of type values returned by this function are not equivalent to the types that are distinguished by the JSON specification. Values returned by this function are all uppercase string values. Some of these indicate items that belong to the JSON type system, like: "OBJECT", "ARRAY", "STRING", "BOOLEAN" and "NULL" (this is the uppercase string - not to be confused with the keyboard for the SQL literal NULL-value). But some refer to native MySQL data types: "INTEGER", "DOUBLE", and "DECIMAL"; "DATE", "TIME", and "DATETIME", and "OPAQUE".
bool JSON_VALID(string): Returns whether the passed value could be parsed as a JSON value. This is not limited to just JSON objects and arrays, but will also parse JSON built-in special value keywords, like null, true, false.

Manipulating JSON documents

json JSON_INSERT(json, [path, value]+)
: Takes the argument json document, and adds (but does not overwrite) properties or array elements. Returns the resulting document.
json JSON_MERGE(json, json+)
: Folds multiple documents and returns the resulting document.
json JSON_REMOVE(json, path+)
: Remove one or more items specified by the path arguments from the document specified by the JSON argument, and returns the document after removing the specified paths.
json JSON_REPLACE(json, [path, value]+)
: Takes the argument document and overwrites (but does not add) items specified by path arguments, and returns the resulting document.
json JSON_SET(json, [path, value]+)
: Takes the argument document and adds or overwrites items specified by the path arguments, then returns the resulting document.

Functions to manipulate JSON arrays

json JSON_ARRAY_APPEND(json, [path, value]+)
: If the path exists and identifies an array, it appends the value to the array. If the path exists but identifies a value that is not an array, it wraps the value into a new array, and appends the value. If the path does not identify a value at all, the document remains unchanged for that path.
json JSON_ARRAY_INSERT(json, [array-element-path, value]+)
: This function inserts elements into existing arrays. The path must end with an array accessor - it must end with a pair of square braces containing an exact array index (not a wildcard). If the partial path up to the terminal array accessor identies an existing array, and the specified index is less than the array length, the value is inserted at the specified position. Any array elements at and beyond the specified position are shifted down one position to make room for the new element. If the specified index is equal to or exceeds the array length, the new value is appended to the array.
int JSON_LENGTH(json, path?): I already described this one as a function that grabs metadata, but I found this function to be useful particularly when applied arrays.

Removing array elements

Note that there is no dedicated function for removing elements from an array. It is simply done using JSON_REMOVE. Just make sure the path argument denotes an array accessor to identify the element to remove.

To remove multiple elements from an array, you can specify multiple path arguments. In this case, the removal operation is performed sequentially, evaluating all passed path arguments from left to right. So, you have to be very careful which path to pass, since a preceding path may have changed the array you're working on. For example, if you want to remove the first two elements of an array, you should pass a path like '$[0]' twice. Passing '$[0]' and '$[1]' will end up removing elements 0 and 2 of the original array, since after removing the initial element at '$[0]', the element that used to sit at position 1 has been shifted left to position 0. The element that then sits at position 1 is the element that used to sit at position 2:


mysql> select json_remove('[1,2,3,4,5]', '$[0]', '$[0]') "remove elements 0 and 1"
    -> ,      json_remove('[1,2,3,4,5]', '$[0]', '$[1]') "remove elements 0 and 2"
    -> ;
+-------------------------+-------------------------+
| remove elements 0 and 1 | remove elements 0 and 2 |
+-------------------------+-------------------------+
| [3, 4, 5]               | [2, 4, 5]               |
+-------------------------+-------------------------+
1 row in set (0.00 sec)

Concatenating arrays

There is no function dedicated to concatenating arrays. However, you can use JSON_MERGE to do so:


mysql> SELECT JSON_MERGE('[0,1]', '[2,3]');
+------------------------------+
| JSON_MERGE('[0,1]', '[2,3]') |
+------------------------------+
| [0, 1, 2, 3]                 |
+------------------------------+
1 row in set (0.00 sec)

Slicing arrays

There is no dedicated function or syntax to take a slice of an array. If you don't need to slice arrays, then good - you're lucky. If you do need it, I'm afraid you're up for a challenge: I don't think there is a convenient way to do it. I filed a feature request and I hope this will be followed up.

JSON Schema Validation

Currently, the JSON functions provide a JSON_VALID() function, but this can only check if a string conforms to the JSON syntax. It does not verify whether the document conforms to predefined structures (a schema).

I anticipate that it might be useful to be able to ascertain schema conformance of JSON documents within MySQL. The exact context is out of scope for this post, but I would already like to let you know that I am working on a JSON schema validator. It can be found on github here: mysql-json-schema-validator.
Stay tuned - I will do a writeup on that as soon as I complete a few more features that I believe are essential.

MySQL JSON is actually a bit like BSON

MySQL's JSON type is not just a blob with a fancy name, and it is not entirely the same as standard JSON. MySQL's JSON type is more like MongoDB's BSON: it preserves native type information. The most straightforward way to make this clear is by creating different sorts of JSON values using CAST( ... AS JSON) and then reporting the type of the result using JSON_TYPE:


mysql> SELECT  JSON_TYPE(CAST('{}' AS JSON)) as "object"
    -> ,       JSON_TYPE(CAST('[]' AS JSON)) as "array"
    -> ,       JSON_TYPE(CAST('""' AS JSON)) as "string"
    -> ,       JSON_TYPE(CAST('true' AS JSON)) as "boolean"
    -> ,       JSON_TYPE(CAST('null' AS JSON)) as "null"
    -> ,       JSON_TYPE(CAST(1 AS JSON)) as "integer"
    -> ,       JSON_TYPE(CAST(1.1 AS JSON)) as "decimal"
    -> ,       JSON_TYPE(CAST(PI() AS JSON)) as "double"
    -> ,       JSON_TYPE(CAST(CURRENT_DATE AS JSON)) as "date"
    -> ,       JSON_TYPE(CAST(CURRENT_TIME AS JSON)) as "time"
    -> ,       JSON_TYPE(CAST(CURRENT_TIMESTAMP AS JSON)) as "datetime"
    -> ,       JSON_TYPE(CAST(CAST('""' AS BINARY) AS JSON)) as "blob"
    -> \G
*************************** 1. row ***************************
  object: OBJECT
   array: ARRAY
  string: STRING
 boolean: BOOLEAN
    null: NULL
 integer: INTEGER
 decimal: DECIMAL
  double: DOUBLE
    date: DATE
    time: TIME
datetime: DATETIME
    blob: BLOB
1 row in set (0.00 sec)

What this query shows is that internally, values of the JSON type preserve native type information. Personally, I think that is a good thing. JSON's standard type system is rather limited. I would love to see standard JSON support for proper decimal and datetime types.

Comparing JSON objects to JSON objects

The MySQL JSON type system is not just cosmetic - the attached internal type information affects how the values work in calculations and comparisons. Consider this comparison of two JSON objects:


mysql> SELECT CAST('{"num": 1.1}' AS JSON) = CAST('{"num": 1.1}' AS JSON);
+-------------------------------------------------------------+
| CAST('{"num": 1.1}' AS JSON) = CAST('{"num": 1.1}' AS JSON) |
+-------------------------------------------------------------+
|                                                           1 |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

This is already quite nice - you can't compare two objects like that in javascript. Or actually, you can, but the result will be false since you'd be comparing two distinct objects that simply happen to have the same properties and property values. But usually, with JSON, we're just interested in the data. Since the objects that are compared here are totally equivalent with regard to composition and content, I consider the ability to directly compare objects as a bonus.

It gets even nicer:


mysql> SELECT CAST('{"num": 1.1, "date": "2015-11-01"}' AS JSON) = CAST('{"date": "2015-11-01", "num": 1.1}' AS JSON);
+---------------------------------------------------------------------------------------------------------+
| CAST('{"num": 1.1, "date": "2015-11-01"}' AS JSON) = CAST('{"date": "2015-11-01", "num": 1.1}' AS JSON) |
+---------------------------------------------------------------------------------------------------------+
|                                                                                                       1 |
+---------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Again, the result is true, indicating that these objects are equivalent. But you'll notice that the property names appear in different order between these two objects. But the direct comparison ignores the property order - it only takes into account whether a property exists at a particular path, and whether the property values are the same. One can argue about whether the property order should be deemed significant in a comparsison. The JSON spec doesn't specify so. But I'm inclined to say that MySQL's behavior here is a nice feature.

Now let's try something a bit like that first comparison, but in a slightly different way:


mysql> SELECT  JSON_OBJECT('bla', current_date)
    -> ,       JSON_OBJECT('bla', current_date) = JSON_OBJECT('bla', current_date)
    -> ,       JSON_OBJECT('bla', current_date) = CAST('{"bla": "2015-11-01"}' AS JSON)
    -> \G
*************************** 1. row ***************************
                                        JSON_OBJECT('bla', current_date): {"bla": "2015-11-01"}
     JSON_OBJECT('bla', current_date) = JSON_OBJECT('bla', current_date): 1
JSON_OBJECT('bla', current_date) = CAST('{"bla": "2015-11-01"}' AS JSON): 0
1 row in set (0.00 sec)

The difference here is of course creating the object using JSON_OBJECT as opposed to using CAST(... AS JSON). While the string representation of the result of JSON_OBJECT('bla', current_date) looks exactly the same like that of CAST('{"bla": "2015-11-01"}' AS JSON), they are not equivalent: in the case of JSON_OBJECT, MySQL internally attached native type information to the property which is of the type DATE (a type that does not exist in standard JSON), whereas in the case of the CAST(... AS JSON) operation, MySQL did not have any additional type information for the value of the property, leaving it no other choice than to assume a STRING type. The following query proves the point:


mysql> SELECT  JSON_TYPE(JSON_EXTRACT(JSON_OBJECT('bla', current_date), '$.bla'))
    -> ,       JSON_TYPE(JSON_EXTRACT(CAST('{"bla": "2015-11-01"}' AS JSON), '$.bla'))
    -> \G
*************************** 1. row ***************************
     JSON_TYPE(JSON_EXTRACT(JSON_OBJECT('bla', current_date), '$.bla')): DATE
JSON_TYPE(JSON_EXTRACT(CAST('{"bla": "2015-11-01"}' AS JSON), '$.bla')): STRING
1 row in set (0.00 sec)

Comparing JSON values to non-JSON values

Fortunately, comparison of JSON values to MySQL non-JSON values is pretty consistent, without requiring explicit CAST operations. This may sound obvious, but it's really not. The following query might explain better what I mean. Consider a JSON object with a property called "myProp" that has a string value of "value1":


mysql> SELECT JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp');
+-----------------------------------------------------------+
| JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp') |
+-----------------------------------------------------------+
| "value1"                                                  |
+-----------------------------------------------------------+
1 row in set (0.00 sec)

Note the double quotes around the value - when we extract the value of the myProp property, the result is a JSON string - not a native MySQL character type. And when that result is rendered by the client, its MySQL string representation includes the double quotes. To get a proper MySQL string, we can apply JSON_UNQUOTE(), like this:


mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp'));
+-------------------------------------------------------------------------+
| JSON_UNQUOTE(JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp')) |
+-------------------------------------------------------------------------+
| value1                                                                  |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)

But fortunately, we don't really need to apply JSON_UNQUOTE() for most operations. For example, to compare the extracted value with a regular MySQL string value, we can simply do the comparison without explicitly casting the MySQL string to a JSON type, or explicitly unquoting the JSON string value to a MySQL string value:


mysql> SELECT JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp') = 'value1';
+----------------------------------------------------------------------+
| JSON_EXTRACT(JSON_OBJECT('myProp', 'value1'), '$.myProp') = 'value1' |
+----------------------------------------------------------------------+
|                                                                    1 |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)

Again, I think this is very good news!

Still, there definitely are some gotcha's. The following example might explain what I mean:


mysql> SELECT  CURRENT_DATE
    -> ,       CURRENT_DATE = '2015-11-01'
    -> ,       JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp')
    -> ,       JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp') = '2015-11-01'
    -> ,       JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp') = CURRENT_DATE
    -> ,       JSON_UNQUOTE(JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp')) = '2015-11-01'
    -> \G
*************************** 1. row ***************************
                                                                              CURRENT_DATE: 2015-11-01
                                                               CURRENT_DATE = '2015-11-01': 1
                             JSON_EXTRACT(JSON_OBJECT('myProp', current_date), '$.myProp'): "2015-11-01"
              JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp') = '2015-11-01': 0
              JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp') = CURRENT_DATE: 1
JSON_UNQUOTE(JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp')) = '2015-11-01': 1
1 row in set (0.00 sec)

Note that this is the type of thing that one might easily get wrong. The comparison CURRENT_DATE = '2015-11-01' suggests the MySQL date value is equal to its MySQL string representation, and the comparison JSON_EXTRACT(JSON_OBJECT('myProp', current_date), '$.myProp') = CURRENT_DATE suggests the value extracted from the JSON document is also equal to the date value.

From these two results one might expect that JSON_EXTRACT(JSON_OBJECT('myProp', CURRENT_DATE), '$.myProp') would be equal to '2015-11-01' as well, but the query clearly shows this is not the case. Only when we explicitly apply JSON_UNQUOTE does the date value extracted from the JSON document become a real MySQL string, which we then can compare with the string value '2015-11-01' successfully.

When you think about a minute what really happens, it does make sense (at least, I think it does):

A MySQL date is equivalent to the MySQL string representation of that date
A MySQL date is equivalent to it's JSON date representation
A JSON date is not equal to the MySQL string representation of that date
A MySQL string representation of a JSON date is equal to the MySQL string representation of that date

That said, you might still find it can catch you when off guard.

Table columns of the JSON type

The JSON type is not just a runtime type - it is also available as a storage data type for table columns. A problem though is that there is no direct support for indexing JSON columns, which is sure to become a problem in case you plan to query the table based on the contents of the JSON document. Any WHERE, JOIN...ON, GROUP BY or ORDER BY-clause that relies on extracting a value from the JSON column is sure to result in a full table scan.

There is a workaround though: Once you know the paths for those parts of the document that will be used to filter, order and aggregate the data, you can create generated columns to have these values extracted from the document, and then put an index on those generated columns. This practice is recommended for MySQL by the manual page for CREATE TABLE. A complete example is given in the section called Secondary Indexes and Virtual Generated Columns.

Obviously, this approach is not without issues:

You will need to rewrite your queries accordingly to use those generated columns rather than the raw extraction operations on the document. Or at least, you will have to if you want to benefit from your indexes.
Having to create separate columns in advance seems at odds with schema flexibility, which I assume is a highly-valued feature for those that find they need JSON columns.
The generated columns will require additional storage.

Of these concerns, I feel that the need to rewrite the queries is probably the biggest problem. The additional storage seems to be the smallest issue, assuming the number of items that you need to index is small as compared to the entire document. (Although I can imagine the extra storage would start to count when you want to extract large text columns for full-text indexing). That said, if I understand correctly, if you create the index on VIRTUAL generated columns, only the index will require extra storage - there won't also be storage required for the columns themselves. (Note that creating an index will always require extra storage - that's just how it works, both in MySQL, as well as in specialized document databases like MongoDB.)

As far as I can see now, any indexing scheme that requires us to elect the items within the documents that we want to index in advance suffers from the same drawback: If the schema evolves in such a way that fields that used to be important enough to be deemed fit for indexing get moved or renamed often, then this practice will affect all aspects of any application that works on the document store. My gut feeling is that despite the theoretical possibility of schema flexibility, this will cause enough inertia in the schema evolution (at least, with respect to those items that we based our indexes on) to be well in time to come up with other solutions. To be fair though, having to set up generated columns would probably add a some extra inertia as compared to a pure document database (like MongoDB).

But my main point still stands: if you choose to keep changing the schema all the time, especially if it involves those items that you need to filter, sort, or aggregate the data, then the changes will affect almost every other layer of your application - not just your database. Apparently, that's what you bargained for and in the light of all other changes that would be needed to support this practice of a dynamic schema evolution, it seems that setting up a few extra columns should not be that big a deal.

JSON Columns and Indexing Example

Just to illustrate how it would work out, let's try and setup a table to store JSON documents. For this example, I'm looking at the Stackexchange datasets. There are many such datasets for various topic, and I'm looking at the one for math.stackexchange.com because it has a decent size - 873MB. Each of these archives comprises 8 xml files, and I'm using the Posts.xml file. One post document might look like this:


<row 
  Id="1" 
  PostTypeId="1" 
  AcceptedAnswerId="9"
  CreationDate="2010-07-20T19:09:27.200" 
  Score="85" 
  ViewCount="4121" 
  Body="&lt;p&gt;Can someone explain to me how there can be different kinds of infinities?&lt;/p&gt;" 
  OwnerUserId="10" 
  LastEditorUserId="206259" 
  LastEditorDisplayName="user126" 
  LastEditDate="2015-02-18T03:10:12.210" 
  LastActivityDate="2015-02-18T03:10:12.210" 
  Title="Different kinds of infinities?" 
  Tags="&lt;set-theory&gt;&lt;intuition&gt;&lt;faq&gt;" 
  AnswerCount="10" 
  CommentCount="1" 
  FavoriteCount="28"
/>

I'm using Pentaho Data Integration to read these files and to convert them into JSON documents. These JSON documents look like this:


{
  "Id": 1,
  "Body": "<p>Can someone explain to me how there can be different kinds of infinities?<\/p>",
  "Tags": "<set-theory><intuition><faq>",
  "Score": 85,
  "Title": "Different kinds of infinities?",
  "PostTypeId": 1,
  "AnswerCount": 10,
  "OwnerUserId": 10,
  "CommentCount": 1,
  "CreationDate": "2010-07-20 19:09:27",
  "LastEditDate": "2015-02-18 03:10:12",
  "AcceptedAnswerId": 9,
  "LastActivityDate": "2015-02-18 03:10:12",
  "LastEditorUserId": 206259
}

Initially, let's just start with a simple table called posts with a single JSON column called doc:


CREATE TABLE posts (
  doc JSON
);

After loading, I got a little over a million post documents in my table:


mysql> select count(*) from posts;
+----------+
| count(*) |
+----------+
|  1082988 |
+----------+
1 row in set (0.66 sec)

(There are actually some 5% more posts in the stackexchange data dump, but my quick and dirty transformation to turn the XML into JSON led to a bunch of invalid JSON documents, and I didn't bother to perfect the transformation enough to get them all. A million is more than enough to illustrate the approach though.)

Now, let's find the post with Id equal to 1:


mysql> select doc from posts where json_extract(doc, '$.Id') = 1
    -> \G
*************************** 1. row ***************************
doc: {"Id": 1, "Body": ">p<Can someone explain to me how there can be different kinds of infinities?</p>", "Tags": "<set-theory><intuition><faq>", "Score": 85, "Title": "Different kinds of infinities?", "PostTypeId": 1, "AnswerCount": 10, "OwnerUserId": 10, "CommentCount": 1, "CreationDate": "2010-07-20 19:09:27", "LastEditDate": "2015-02-18 03:10:12", "AcceptedAnswerId": 9, "LastActivityDate": "2015-02-18 03:10:12", "LastEditorUserId": 206259}
1 row in set (1.45 sec)

Obviously, the query plan requires a full table scan:


mysql> explain select doc from posts where json_extract(doc, '$.Id') = 1;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | posts | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1100132 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

First, let's try and add a generated column for the Id. The Id is, as its name implies, unique, and it seems sensible to create a PRIMARY KEY for that as well:


mysql> ALTER TABLE posts
    -> ADD id INTEGER UNSIGNED
    -> GENERATED ALWAYS AS (JSON_EXTRACT(doc, '$.Id'))
    -> STORED
    -> NOT NULL PRIMARY KEY;
Query OK, 1082988 rows affected (36.23 sec)
Records: 1082988  Duplicates: 0  Warnings: 0

You might notice that in this case, the generated column is STORED rather than VIRTUAL. This is the case because MySQL won't let you create a PRIMARY KEY on a VIRTUAL generated column. If you try it anyway, you'll get:


mysql> ALTER TABLE posts
    -> ADD id INTEGER UNSIGNED
    -> GENERATED ALWAYS AS (JSON_EXTRACT(doc, '$.Id')) NOT NULL
    -> VIRTUAL
    -> PRIMARY KEY;
ERROR 3106 (HY000): 'Defining a virtual generated column as primary key' is not supported for generated columns.

Now, let's try our -modified- query again:


mysql> explain select doc from posts where id = 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | posts | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

If you actually try to run the query you'll notice it returns instantly - as is to be expected, since we can now access the document directly via the PRIMARY KEY.

Now, let's try this again but using a VIRTUAL column and a UNIQUE index:


mysql> ALTER TABLE posts
    -> DROP COLUMN id
    -> ;
Query OK, 1082988 rows affected (35.44 sec)
Records: 1082988  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE posts
    -> ADD id INTEGER UNSIGNED
    -> GENERATED ALWAYS AS (JSON_EXTRACT(doc, '$.Id'))
    -> VIRTUAL
    -> NOT NULL UNIQUE;
Query OK, 1082988 rows affected (36.61 sec)
Records: 1082988  Duplicates: 0  Warnings: 0

Now the plan is:


mysql> explain select doc from posts where id = 1;
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | posts | NULL       | const | id            | id   | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

The plan is almost the same, except of course that now access if via the UNIQUE key rather than the PRIMARY KEY. The query again returns almost instantly, although it will be slightly slower.

That said, this example is not so much about making a benchmark or measuring performance, it's more about showing how to achieve some form of indexing when storing JSON documents in a MySQL table. I truly hope someone else will try and conduct a serious benchmark so that we can get an idea just how performance of the MySQL JSON type compares to alternative solutions (like the PostgreSQL JSON type, and MongoDB). I feel I lack both the expertise and the tools to do so myself so I'd rather leave that to experts.

In Conclusion

MySQL JSON support looks pretty complete.
Integration of JSON type system and MySQL native type system is, in my opinion, pretty good, but there are definitely gotcha's.
Achieving indexing for JSON columns relies on a few specific workarounds, which may or may not be compatible with your requirements.

I hope this post was useful to you. I sure learned a lot by investigating the feature, and it gave me a few ideas of how I could use the JSON features in the future.
PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Slow Query Log Rotation

November 1, 2015, 4:58 pm

≫ Next: MySQL-Docker operations. - Part 2: Customizing MySQL in Docker

≪ Previous: MySQL: a few observations on the JSON type

Some time ago, Peter Boros at Percona wrote this post: Rotating MySQL slow logs safely. It contains good info, such as that one should use the rename method for rotation (rather than copytruncate), and then connect to mysqld and issue a FLUSH LOGS (rather than send a SIGHUP signal).

So far so good. What I do not agree with is the additional construct to prevent slow queries from being written during log rotation. The author’s rationale is that if too many items get written while the rotation is in process, this can block threads. I understand this, but let’s review what actually happens.

Indeed, if one were to do lots of writes to the slow query log in a short space of time, a write could block while waiting.

Is the risk of this occurring greater during a logrotate operation? I doubt it. A FLUSH LOGS has to close and open the file. While there is no file open, no writes can occur anyhow and they may be stored in the internal buffer of the lowlevel MySQL code for this.

In any case, if there is such a high write rate, that is an issue in itself: it is not useful to have the slow query log write that fast. Instead, you’d up the long_query_time and min_examined_rows variables to reduce the effectively “flow rate”. It’s always best to resolve an underlying issue rather than its symptom(s).

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL-Docker operations. - Part 2: Customizing MySQL in Docker

November 1, 2015, 9:00 pm

≫ Next: Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown

≪ Previous: Slow Query Log Rotation

Previous Episodes:

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker

After seeing the basics of deploying a MySQL server in Docker, in this article we will lay the foundations to customising a node and eventually using more than one server, so that we can cover replication in the next one.

Enabling GTID: the dangerous approach.

To enable GTID, you need to set five variables in the database server:

master-info-repository=table
relay-log-info-repository=table
enforce-gtid-consistency
gtid_mode=ON
log-bin=mysql-bin

For MySQL 5.6, you also need to set log-slave-updates, but we won't deal with such ancient versions here.
Using the method that we've seen in Part 1, we can use a volume to change the default /etc/my.cnf with our own.

$ cat my-gtid.cnf
[mysqld]
user  = mysql
port  = 3306
log-bin  = mysql-bin
relay-log = mysql-relay
server-id = 12345

master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

However, this approach may fail. It will work with some MySQL images, but depending on how the image is built, the server may not install at all.

$ docker run --name boxedmysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -v $PWD/my-gtid.cnf:/etc/my.cnf \
    -d mysql/mysql-server
b9c15ed3c40c078db5335dcb76c10da1788cee43b3e32e20c22b937af50248c5

$ docker exec -it boxedmysql bash
Error response from daemon: Container boxedmysql is not running

The reason for the failure is Bug#78957. When my.cnf contains log-bin and mysql is called prior to the installation to perform some detection tasks, the server creates the binary log index in the data directory. After that, the installation task will abort because the data directory is not empty. It sounds as if there is a set of unnecessary actions here (the server should not create the index without other components in place, and the installer should not complain about finding a harmless file in the data directory) but this is the way it is, and we should work around it. At the time of writing, the bug has received a temporary fix and the installation now works.
All considered, it's best that we are forced to run things this way, because there are side effects of enabling GTIDs at startup: there will be unwanted GTID sets in the server, and that could be annoying.

To avoid the possible side effects of a noisy installation that pollute the GTID record there are two methods:

Install as we did above –provided that the image does not break because of the additional options– and then run reset master.
Deploy the configuration file in a temporary location, move it to /etc/ after the server has been initialised, and then restart the container:
1. We use a volume for the configuration file, but we don't replace the default one. This way, the installation will proceed with default values.
2. After the installation is completed, we move the new configuration file in the default place.
3. We restart the server, and it will come up with the new configuration.

Each method has its pros and cons. The single installation followed by reset master is probably the most reliable. Both methods are relatively simple to deploy.
You could also enable GTID online, but since you need GTID for replication, you also need to set log_bin and that's a non-dynamic variable that needs to be set in the configuration file. Therefore, the all-online method can be ruled out.

Checking the server health.

After we deploy the container, we don't know for sure if its installation was completed. We have two ways of checking the status. We can look at the container logs, or we can try connecting to the server and see if it answers.
Things are easy when installation succeeds, because both methods return an immediate result:

$ docker logs boxedmysql
Initializing database
Database initialized
MySQL init process in progress...
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.

/entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*


MySQL init process done. Ready for start up.

$ docker exec -it boxedmysql mysql -psecret -e 'select "ALIVE!" as status'
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------+
| status |
+--------+
| ALIVE! |
+--------+

It is more difficult to determine success or failure when the server is taking more time to be initialised. This is often the case when we deploy several images at once (for example, to set up replication.) In this case, you don't know if you are seeing a failure or a simple delay.
My preferred method of checking readiness is running a query repeatedly, until it returns an expected result. Something like this:

exit_code=-1
max_attempts=10
attempts=0
while [ "$exit_code" != "0" ]
do
    docker exec -it boxedmysql mysql -psecret -e 'select "ALIVE!" as status'
    exit_code=$?
    attempts=$(($attempts+1))
    if [ $attempts -gt $max_attempts ]
    then
        echo "Max attempts reached. Aborting"
        exit 1
    fi
done

If the test fails, we should have a look at the logs.

$ docker logs boxedmysql
Initializing database
2015-10-28T11:53:32.676232Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28T11:53:32.677764Z 0 [ERROR] You have enabled the binary log, but you haven't provided the mandatory server-id. Please refer to the proper server start-up parameters documentation
2015-10-28T11:53:32.677830Z 0 [ERROR] Aborting

In this case, we were using a configuration file with log-bin but without server-id, which is easily corrected.
Sometimes the logs are empty, because the container died before the docker engine could record the outcome. This happens sometimes with Docker in a VM (for example in OS X). If you see this, just run the command again after attempting a restart:

$ docker logs boxedmysql
Initializing database

$ docker restart boxedmysql

$ docker logs boxedmysql
2015-10-28T11:53:32.676232Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28T11:53:32.677764Z 0 [ERROR] You have enabled the binary log, but you haven't provided the mandatory server-id. Please refer to the proper server start-up parameters documentation
2015-10-28T11:53:32.677830Z 0 [ERROR] Aborting

Invoking MySQL without an explicit password

You may have noticed that the output of docker exec ... mysql -psecret includes a warning about using a password on the command line. While the warning is legitimate, we have a bit of a problem if we want to compare the result of the operation with a given value. The method used to suppress this warning is to invoke mysql with a configuration file containing username and password. Notice that this file can't be in the host computer. It must be inside the container.
We can put the username and password in either /etc/my.cnf or $HOME/.my.cnf (which in the case of docker is /root/.my.cnf). However, a brute force attempt of adding username and password to either of these files as a volume will fail. The reason for this failure is that the root password will be set immediately after the initialisation, but the root user, at that moment, runs without password. Setting the password in one of these files will cause the client to be invoked with a password that still does not exist, and the installation will fail.
What we need to do is set the credentials file with a different name, and then use it:

$ cat home_my.cnf
[client]
user=root
password=secret
[mysql]
prompt='MyDocker [\h] {\u} (\d) > '

$ docker run --name boxedmysql \
   -v $PWD/my-minimal.cnf:/etc/my.cnf \
   -v $PWD/home_my.cnf:/root/home_my.cnf  \
   -e MYSQL_ROOT_PASSWORD=secret  \
   -d mysql/mysql-server
3cfec22a1c52bb4a784352bb7d03bc4bc9e5ed3bf4d3e7c1567a6d7e8a670db8

Here the installation succeeds:

$ docker exec -it boxedmysql mysql --defaults-file=/root/home_my.cnf -e 'select "ALIVE!" as status'
+--------+
| status |
+--------+
| ALIVE! |
+--------+

And now we can move the file into its expected position. The next call to mysql, without password, will succeed!

$ docker exec -it boxedmysql cp /root/home_my.cnf /root/.my.cnf

$ docker exec -it boxedmysql mysql -e 'select "LOOK MA, NO PASSWORD!" as status'
+-----------------------+
| status                |
+-----------------------+
| LOOK MA, NO PASSWORD! |
+-----------------------+

If we invoke mysql without a command, we will see the customised prompt

$ docker exec -it boxedmysql mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MyDocker [localhost] {root} ((none)) >

This method will be handy when we set replication and the prompt will help us identifying the various nodes.
Once we're able to run mysql without indicating credentials on the command line, we can use this fact to get a result from the database and compare to an expected value.

$ docker exec -it boxedmysql mysql -BN -e 'select @@version'
5.7.9

Look simple. The devil is in the details, though. Look at the following:

$ VERSION=$(docker exec -it boxedmysql mysql -BN -e 'select @@version')
echo $VERSION
5.7.9

$ if [ "$VERSION" == "5.7.9" ] ; then echo correct ; else echo wrong; fi
wrong

What happened? The version must be correct. Why does the comparison fail?
There is an hidden character in the answer, and you must be aware of it when using Docker. Every response from a container will contain an extra newline. Let's try with a simple response.

$ NUM=$(docker  exec -it mysql-node1 echo 1)
$ echo "<$NUM>"
>1

You see that the last character of the string , the closing '&gt', appears at the start of the line, overwriting the initial '<'. That's because the linefeed character brings the cursor at the start of the line. We can look at the output through hexdump, and see the extra byte in there.

$ echo -n $NUM |hexdump
0000000 0d31
0000002

$ echo -n 1 |hexdump
0000000 0031
0000001

If you want your comparison to work, you need to remove the extra linefeed.

$ VERSION=$(docker exec -it boxedmysql mysql -BN -e 'select @@version' | tr -d '\r')
echo $VERSION
5.7.9

$ if [ "$VERSION" == "5.7.9" ] ; then echo correct ; else echo wrong; fi
correct

Of course, if you query the database using an local application connected to port 3306 in the container, you won't have this problem. Even using a local mysql client connected to the container IP will work well. Just be carefully when comparing anything that comes from a docker exec command.

Enabling GTID: a safer approach.

We have seen what can make the installation fail. Let's see now how we can enable GTID with a safer method.

$ docker run --name=boxedmysql \
    -v $PWD/my-gtid.cnf:/etc/my_new.cnf \
    -e MYSQL_ROOT_PASSWORD=secret \
    -v $PWD/home_my.cnf:/root/home_my.cnf \
    -d mysql/mysql-server
8887640b20056ac6912732eeeb54c1a6f1d0a17589a14a9c075de021f52a8c90

The customized configuration file was copied to a non-standard location, and therefore it won't be used by the server, which will initialize using its default values. We can prove it:


$ docker exec -ti boxedmysql mysql  --defaults-file=/root/home_my.cnf -e 'select @@server_id, @@gtid_mode'
+-------------+-------------+
| @@server_id | @@gtid_mode |
+-------------+-------------+
|           0 | OFF         |
+-------------+-------------+

The server is running with its default values. We can use dicker exec to copy the candidate file into the default location. After we restart the server, the database will be running with GTID enabled.


$ docker exec -ti boxedmysql cp /etc/my_new.cnf /etc/my.cnf
$ docker restart boxedmysql
boxedmysql
$ docker exec -ti boxedmysql mysql  --defaults-file=/root/home_my.cnf -e 'select @@server_id, @@gtid_mode'
+-------------+-------------+
| @@server_id | @@gtid_mode |
+-------------+-------------+
|       12345 | ON          |
+-------------+-------------+

Sharing data between servers.

The last trick that we look at in this article is a method for sharing data between containers. This problem is felt especially when running replication systems. If you want to add a new node, having a shared storage for backup and restore operations between nodes will greatly simplify things. The recipe is simple:

Create an empty "data container" with a directory that we want to share.
Create a regular container that gets a volume from the data container.
Repeat step #2 for all nodes.

$ docker create -v /dbdata --name dbdata mysql/mysql-server /bin/true
a89396abcb8bc19c58d7e5376e57a63ae69bdca2d20fd24d4037456a8180f11b

$ docker run --name mysql1 -e MYSQL_ROOT_PASSWORD=secret  --volumes-from dbdata   -d mysql/mysql-server
4773807c9aabb7eebba9f5396e52b1ee2e1aeea322dbc4e3d0f1d00f600d90cd

$ docker run --name mysql2 -e MYSQL_ROOT_PASSWORD=secret  --volumes-from dbdata   -d mysql/mysql-server
f3a86114d880a5b1d7c786a9f68334528eb56140e6e2d6ecbe3987bd8c794586

Now the two containers mysql1 and mysql2 can see and use /dbdata.

$ docker exec -it mysql1 bash
[root@4773807c9aab /]# ls -l /dbdata/
total 0
[root@4773807c9aab /]#  mysqldump -p --all-databases > /dbdata/mysql1.sql
Enter password:
[root@4773807c9aab /]# ls -l /dbdata/
total 3140
-rw-r--r-- 1 root root 3214497 Oct 28 13:44 mysql1.sql
[root@4773807c9aab /]# ls -lh /dbdata/
total 3.1M
-rw-r--r-- 1 root root 3.1M Oct 28 13:44 mysql1.sql

The first container has created a backup in /dbdata.


$ docker exec -it mysql2 bash
[root@f3a86114d880 /]# ls -l /dbdata/
total 3140
-rw-r--r-- 1 root root 3214497 Oct 28 13:44 mysql1.sql

The second container can see and use it.
In this example, the shared directory is inside a docker internal volume. If we want to use a directory in the host operating systems, we change the creation slightly:

$ docker create -v /opt/docker/data:/dbdata --name dbdata mysql/mysql-server /bin/true
a89396abcb8bc19c58d7e5376e57a63ae69bdca2d20fd24d4037456a8180f11b

With this command, /dbdata now points to an external directory, which is also shared among containers.

What's next

This article has given us most of the elements necessary to run more complex operations.
In Part 3 we will finally run MySQL containers in replication, with a set of scripts that automate the procedure. You will see a master with 10 slaves being deployed and operational in less than one minute.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown

November 2, 2015, 1:30 am

≫ Next: Become a MySQL DBA blog series - Understanding the MySQL Error Log

≪ Previous: MySQL-Docker operations. - Part 2: Customizing MySQL in Docker

Mon, 2015-11-02 09:30

geoff_montee_g

Losing temporary tables on a slave when binlog_format is not set to ROW is a well-known problem, and there is even a way to avoid it, as described by the safe slave shutdown procedure in the MySQL documentation. However, the documentation doesn't describe how to fix your slave if you accidentally shut it down while it has temporary tables open. In this blog post, I'll describe how to do that.

The Problem

Let's say that you run these statements on a master server:

-- statement 1 DROP TEMPORARY TABLE IF EXISTS tmp_table; -- statement 2 CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ); -- statement 3 INSERT INTO real_table VALUES (1, 'str1', false, NULL); -- statement 4 INSERT INTO real_table VALUES (2, 'str2', false, NULL); -- statement 5 INSERT INTO tmp_table SELECT id, str FROM real_table; -- statement 6 UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);

These statements will end up in the slave's binary log, and it will execute them one at a time. If the slave is shut down after it executes statement 2, but before it executes statements 5 and 6, then tmp_table will not be available to the SQL thread when the server is brought back up. This will result in errors that look like this in SHOW SLAVE STATUS:

Last_SQL_Errno: 1146 Last_SQL_Error: Error 'Table 'db1.tmp_table' doesn't exist' on query. Default database: 'db1'. Query: 'INSERT INTO tmp_table SELECT id, str FROM real_table'

The Solution

Normally, to fix a missing table on a slave, you can just manually create the table on the slave. However, temporary tables are unique to a given session, so if you create the temporary table manually, the SQL thread still won't see it. To fix the problem, you have to actually force the SQL thread to re-execute the CREATE TEMPORARY TABLE statement for the thread that originally executed the statement (as given by the thread_id identifier in the binary log). At the same time, you don't want the slave to re-execute statements that will result in duplicate key errors or cause the slave to become inconsistent.

To figure out how to do this in the example above, let's look at the binary logs for these events:

# at 603 #151015 16:46:35 server id 1 end_log_pos 705 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; DROP TEMPORARY TABLE IF EXISTS tmp_table /*!*/; # at 705 #151015 16:46:35 server id 1 end_log_pos 839 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ) /*!*/; # at 839 #151015 16:46:35 server id 1 end_log_pos 955 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (1, 'str1', false, NULL) /*!*/; # at 955 #151015 16:46:35 server id 1 end_log_pos 1071 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (2, 'str2', false, NULL) /*!*/; # at 1071 #151015 16:46:35 server id 1 end_log_pos 1185 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO tmp_table SELECT id, str FROM real_table /*!*/; # at 1185 #151015 16:46:43 server id 1 end_log_pos 1352 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942003/*!*/; SET @@session.time_zone='SYSTEM'/*!*/; UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table) /*!*/; # at 1352 #151015 16:46:51 server id 1 end_log_pos 1476 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942011/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=8,@@session.collation_server=8/*!*/; DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `tmp_table` /*!*/;

Let's outline where our statements are located in the binary log:

Statement 1
Statement: DROP TEMPORARY TABLE IF EXISTS tmp_table;
Start pos: 603
End pos: 705
statement 2
statement:
CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) );
Start pos: 705
End pos: 839
Statement 3
Statement: INSERT INTO real_table VALUES (1, 'str1', false, NULL);
Start pos: 839
End pos: 955
Statement 4
Statement: INSERT INTO real_table VALUES (2, 'str2', false, NULL);
Start pos: 955
End pos: 1071
Statement 5
Statement: INSERT INTO tmp_table SELECT id, str FROM real_table;
Start pos: 1071
End pos: 1185
Statement 6
Statement: UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);
Start pos: 1185
End pos: 1352

If the slave is shutdown unsafely, it will likely fail at statement 5: i.e. when the slave tries to insert into the temporary table. So what can we tell the slave to do in order to fix this? In English, the steps would be:

Go back and execute statement 2.
Ignore statements 3-4.
Start executing normally at statement 5.

We can convert these steps into a series of commands using the binary log positions above:

STOP SLAVE; -- Statement 2 starts at 705 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=705; -- Statement 3 starts at position 839, so let's end the slave before then START SLAVE UNTIL Master_log_file='mysqld-bin.000001', Master_log_pos=838; -- Wait a small amount of time for our SQL thread to execute the statement SELECT SLEEP(60); STOP SLAVE; -- Statement 5 starts at position 1071 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=1071; START SLAVE;

Conclusion

It's not that difficult to recreate temporary tables on a slave after an unsafe shutdown, as long as you still have the binary logs that contained the CREATE TEMPORARY TABLE statements. However, it can be quite tedious if there were a lot of temporary tables open at the time of the shutdown. If your application uses a lot of temporary tables, you may have a lot of errors to fix. In that case, you should consider setting binlog_format=ROW, which would avoid this problem entirely.

Has anyone else figured out interesting ways around this problem?

Tags:

About the Author

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Become a MySQL DBA blog series - Understanding the MySQL Error Log

November 2, 2015, 4:59 am

≫ Next: Log Buffer #447: A Carnival of the Vanities for DBAs

≪ Previous: Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown

We are yet to see a software that runs perfectly, without any issues. MySQL is no exception there. It’s not the software’s fault - we need to be clear about that. We use MySQL in different places, on different hardware and within different environments. It’s also highly configurable. All those features make it great product but they come with a price - sometimes some settings won’t work correctly under certain conditions. It is also pretty easy to make simple human mistakes like typos in the MySQL configuration. Luckily, MySQL provides us with means to understand what is wrong through the error log. In this blog, we’ll see how to read the information in the error log.

This is the fifteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

Error log on MySQL - clean startup and shutdown

MySQL’s error log may contain a lot of information about different issues you may encounter. Let’s start with checking what a ‘clean’ start of MySQL looks like. It will make it easier to find any anomalies later.

At first, all plugins (and storage engines, which do work as plugins in MySQL 5.6) are initiated. If something is not right, you’ll see errors on this stage.

2015-10-26 19:35:20 13762 [Note] Plugin 'FEDERATED' is disabled.

Next we can see a significant part related to InnoDB initialization.

2015-10-26 19:35:20 13762 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-26 19:35:20 13762 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-26 19:35:20 13762 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-10-26 19:35:20 13762 [Note] InnoDB: Memory barrier is not used
2015-10-26 19:35:20 13762 [Note] InnoDB: Compressed tables use zlib 1.2.8
2015-10-26 19:35:20 13762 [Note] InnoDB: Using Linux native AIO
2015-10-26 19:35:20 13762 [Note] InnoDB: Using CPU crc32 instructions
2015-10-26 19:35:20 13762 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2015-10-26 19:35:21 13762 [Note] InnoDB: Completed initialization of buffer pool
2015-10-26 19:35:21 13762 [Note] InnoDB: Highest supported file format is Barracuda.
2015-10-26 19:35:21 13762 [Note] InnoDB: 128 rollback segment(s) are active.
2015-10-26 19:35:21 13762 [Note] InnoDB: Waiting for purge to start
2015-10-26 19:35:21 13762 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 710963181

In the next phase, authentication plugins are initiated (if they are configured correctly).

2015-10-26 19:35:21 13762 [Note] RSA private key file not found: /var/lib/mysql//private_key.pem. Some authentication plugins will not work.
2015-10-26 19:35:21 13762 [Note] RSA public key file not found: /var/lib/mysql//public_key.pem. Some authentication plugins will not work.

At the end you’ll see information about binding MySQL to the configured IP and port. Event scheduler is also initialized. Finally, you’ll see ‘ready for connections’ message indicating that MySQL started correctly.

2015-10-26 19:35:21 13762 [Note] Server hostname (bind-address): '*'; port: 33306
2015-10-26 19:35:21 13762 [Note] IPv6 is available.
2015-10-26 19:35:21 13762 [Note]   - '::' resolves to '::';
2015-10-26 19:35:21 13762 [Note] Server socket created on IP: '::'.
2015-10-26 19:35:21 13762 [Warning] 'proxies_priv' entry '@ root@ip-172-30-4-23' ignored in --skip-name-resolve mode.
2015-10-26 19:35:21 13762 [Note] Event Scheduler: Loaded 2 events
2015-10-26 19:35:21 13762 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.26-74.0'  socket: '/var/run/mysqld/mysqld.sock'  port: 33306  Percona Server (GPL), Release 74.0, Revision 32f8dfd

How does the shutdown process look like? Let’s follow it step by step:

2015-10-26 19:35:13 12955 [Note] /usr/sbin/mysqld: Normal shutdown

This line is very important - MySQL can be stopped in many ways - a user can use init scripts, it’s possible to use ‘mysqladmin’ to execute SHUTDOWN command, it’s also possible to send a SIGTERM to MySQL initiating shutdown of the database. At times you’ll be investigating why MySQL instance had stopped - this line is always an indication that someone (or something) triggered clean shutdown. No crash happened, MySQL wasn’t killed either.

2015-10-26 19:35:13 12955 [Note] Giving 12 client threads a chance to die gracefully
2015-10-26 19:35:13 12955 [Note] Event Scheduler: Purging the queue. 2 events
2015-10-26 19:35:13 12955 [Note] Shutting down slave threads
2015-10-26 19:35:15 12955 [Note] Forcefully disconnecting 6 remaining clients
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 37  user: 'cmon'
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 53  user: 'cmon'
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 38  user: 'cmon'
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 39  user: 'cmon'
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 44  user: 'cmon'
2015-10-26 19:35:15 12955 [Warning] /usr/sbin/mysqld: Forcing close of thread 47  user: 'cmon'

In the above, MySQL is closing remaining connections - it allows them to finish on their own but if they do not close, they will be forcefully terminated.

2015-10-26 19:35:15 12955 [Note] Binlog end
2015-10-26 19:35:15 12955 [Note] Shutting down plugin 'partition'
2015-10-26 19:35:15 12955 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2015-10-26 19:35:15 12955 [Note] Shutting down plugin 'ARCHIVE'
...

Here MySQL is shutting down all plugins that were enabled in its configuration. There are many of them so we removed some output for the sake of better visibility.

2015-10-26 19:35:15 12955 [Note] Shutting down plugin 'InnoDB'
2015-10-26 19:35:15 12955 [Note] InnoDB: FTS optimize thread exiting.
2015-10-26 19:35:15 12955 [Note] InnoDB: Starting shutdown...
2015-10-26 19:35:17 12955 [Note] InnoDB: Shutdown completed; log sequence number 710963181

One of them is InnoDB - this step may take a while as a clean InnoDB shutdown can take some time on a busy server, even with InnoDB fast shutdown enabled (and this is the default setting).

2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'MyISAM'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'MRG_MYISAM'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'CSV'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'MEMORY'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'sha256_password'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'mysql_old_password'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'mysql_native_password'
2015-10-26 19:35:17 12955 [Note] Shutting down plugin 'binlog'
2015-10-26 19:35:17 12955 [Note] /usr/sbin/mysqld: Shutdown complete

The shutdown process concludes with ‘Shutdown complete’ message.

Configuration errors

Let’s start with a basic issue - configuration error. It’s really a popular problem. Sometimes we make a typo in a variable name, while editing my.cnf. MySQL parses its configuration files when it starts. If something is wrong, it will refuse to run. Let’s take a look at some examples:

2015-10-27 11:20:05 18858 [Note] Plugin 'FEDERATED' is disabled.
2015-10-27 11:20:05 18858 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-27 11:20:05 18858 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-27 11:20:05 18858 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-10-27 11:20:05 18858 [Note] InnoDB: Memory barrier is not used
2015-10-27 11:20:05 18858 [Note] InnoDB: Compressed tables use zlib 1.2.8
2015-10-27 11:20:05 18858 [Note] InnoDB: Using Linux native AIO
2015-10-27 11:20:05 18858 [Note] InnoDB: Using CPU crc32 instructions
2015-10-27 11:20:05 18858 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2015-10-27 11:20:05 18858 [Note] InnoDB: Completed initialization of buffer pool
2015-10-27 11:20:05 18858 [Note] InnoDB: Highest supported file format is Barracuda.
2015-10-27 11:20:05 18858 [Note] InnoDB: 128 rollback segment(s) are active.
2015-10-27 11:20:05 18858 [Note] InnoDB: Waiting for purge to start
2015-10-27 11:20:06 18858 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 773268083
2015-10-27 11:20:06 18858 [ERROR] /usr/sbin/mysqld: unknown variable '--max-connections=512'
2015-10-27 11:20:06 18858 [ERROR] Aborting

As you can see from the above, we have here a clear copy-paste error with regards to max_connections variable. MySQL doesn’t accept ‘--’ prefix in my.cnf so if you are copying settings from the ‘ps’ output or from MySQL documentation, please keep it in mind. There are many things that can go wrong but it’s important that ‘unknown variable’ error always points you to my.cnf and to some kind of mistake which MySQL can’t understand.

Another problem can be related to misconfigurations. Let’s say we over-allocated memory on our server. Here’s what you might see:

2015-10-27 11:24:41 31325 [Note] Plugin 'FEDERATED' is disabled.
2015-10-27 11:24:41 31325 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-27 11:24:41 31325 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-27 11:24:41 31325 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-10-27 11:24:41 31325 [Note] InnoDB: Memory barrier is not used
2015-10-27 11:24:41 31325 [Note] InnoDB: Compressed tables use zlib 1.2.8
2015-10-27 11:24:41 31325 [Note] InnoDB: Using Linux native AIO
2015-10-27 11:24:41 31325 [Note] InnoDB: Using CPU crc32 instructions
2015-10-27 11:24:41 31325 [Note] InnoDB: Initializing buffer pool, size = 512.0G
InnoDB: mmap(70330089472 bytes) failed; errno 12
2015-10-27 11:24:41 31325 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2015-10-27 11:24:41 31325 [ERROR] Plugin 'InnoDB' init function returned error.
2015-10-27 11:24:41 31325 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2015-10-27 11:24:41 31325 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-10-27 11:24:41 31325 [ERROR] Aborting

As you can clearly see, we’ve tried to allocate 512G of memory which was not possible on this host. InnoDB is required for MySQL to start, therefore error in initializing InnoDB prevented MySQL from starting.

Permission errors

Below are three examples of three crashes related to permission errors.

/usr/sbin/mysqld: File './binlog.index' not found (Errcode: 13 - Permission denied)
2015-10-27 11:31:40 11469 [ERROR] Aborting

This one is pretty self-explanatory. As stated, MySQL couldn’t find an index file for binary logs due to ‘permission denied’ error. This is also a hard stop for MySQL, it has to be able to read and write to binary logs.

2015-10-27 11:32:46 13601 [Note] Plugin 'FEDERATED' is disabled.
2015-10-27 11:32:46 13601 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-27 11:32:46 13601 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-27 11:32:46 13601 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-10-27 11:32:46 13601 [Note] InnoDB: Memory barrier is not used
2015-10-27 11:32:46 13601 [Note] InnoDB: Compressed tables use zlib 1.2.8
2015-10-27 11:32:46 13601 [Note] InnoDB: Using Linux native AIO
2015-10-27 11:32:46 13601 [Note] InnoDB: Using CPU crc32 instructions
2015-10-27 11:32:46 13601 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2015-10-27 11:32:46 13601 [Note] InnoDB: Completed initialization of buffer pool
2015-10-27 11:32:46 13601 [ERROR] InnoDB: ./ibdata1 can't be opened in read-write mode
2015-10-27 11:32:46 13601 [ERROR] InnoDB: The system tablespace must be writable!
2015-10-27 11:32:46 13601 [ERROR] Plugin 'InnoDB' init function returned error.
2015-10-27 11:32:46 13601 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2015-10-27 11:32:46 13601 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-10-27 11:32:46 13601 [ERROR] Aborting

The above is another permission issue, this time the problem is related to ibdata1 file - shared system tablespace which is used for InnoDB to store its internal data (InnoDB dictionary and, by default, undo logs). Even if you use innodb_file_per_table, which is a default setting in MySQL 5.6, InnoDB still requires this file so it can intialize.

2015-10-27 11:35:21 17826 [Note] Plugin 'FEDERATED' is disabled.
/usr/sbin/mysqld: Can't find file: './mysql/plugin.frm' (errno: 13 - Permission denied)
2015-10-27 11:35:21 17826 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.
2015-10-27 11:35:21 17826 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-27 11:35:21 17826 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-27 11:35:21 17826 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-10-27 11:35:21 17826 [Note] InnoDB: Memory barrier is not used
2015-10-27 11:35:21 17826 [Note] InnoDB: Compressed tables use zlib 1.2.8
2015-10-27 11:35:21 17826 [Note] InnoDB: Using Linux native AIO
2015-10-27 11:35:21 17826 [Note] InnoDB: Using CPU crc32 instructions
2015-10-27 11:35:21 17826 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2015-10-27 11:35:21 17826 [Note] InnoDB: Completed initialization of buffer pool
2015-10-27 11:35:21 17826 [Note] InnoDB: Highest supported file format is Barracuda.
2015-10-27 11:35:21 7fa02e97d780  InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.
2015-10-27 11:35:21 17826 [ERROR] InnoDB: Could not find a valid tablespace file for 'mysql/innodb_index_stats'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2015-10-27 11:35:21 17826 [ERROR] InnoDB: Tablespace open failed for '"mysql"."innodb_index_stats"', ignored.
2015-10-27 11:35:21 7fa02e97d780  InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.
2015-10-27 11:35:21 17826 [ERROR] InnoDB: Could not find a valid tablespace file for 'mysql/innodb_table_stats'. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
2015-10-27 11:35:21 17826 [ERROR] InnoDB: Tablespace open failed for '"mysql"."innodb_table_stats"', ignored.
2015-10-27 11:35:21 7fa02e97d780  InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.

Here we have an example of permission issues related to system schema (‘mysql’ schema). As you can see, InnoDB reported it does not have access rights to open tables in this schema. Again, this is serious problem which is a hard stop for MySQL.

Out of memory errors

Another set of frequent issues related to MySQL servers running out of memory. It can manifest in many different ways but the most popular is this one:

Killed
151027 12:18:58 mysqld_safe Number of processes running now: 0

mysqld_safe is an ‘angel’ process which monitors mysqld and restarts it when it dies. It’s true for MySQL but not for recent Galera nodes - in this case it won’t restart MySQL but you’ll see following sequence:

Killed
151027 12:18:58 mysqld_safe Number of processes running now: 0
151027 12:18:58 mysqld_safe WSREP: not restarting wsrep node automatically
151027 12:18:58 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended

Another way memory issue may show up is through the following messages in the error log:

InnoDB: mmap(3145728 bytes) failed; errno 12
InnoDB: mmap(3145728 bytes) failed; errno 12

What’s important to keep in mind is that error codes can easily be checked using ‘perror’ utility. In this case, it makes perfectly clear what exactly happened:

$ perror 12
OS error code  12:  Cannot allocate memory

If you suspect a memory problem or we’d even say - if you are encountering unexpected MySQL crashes, you can also run dmesg to get some more data. If you see output like below, you can be certain that the problem was caused by OOM Killer terminating MySQL:

[ 4165.802544] Out of memory: Kill process 8143 (mysqld) score 938 or sacrifice child
[ 4165.808329] Killed process 8143 (mysqld) total-vm:5101492kB, anon-rss:3789388kB, file-rss:0kB
[ 4166.226410] init: mysql main process (8143) killed by KILL signal
[ 4166.226437] init: mysql main process ended, respawning

InnoDB crashes

InnoDB, most of the time, is a very solid and durable storage engine. Unfortunately, under some circumstances, data may become corrupted. Usually it’s related to either misconfiguration (for example, disabling double-write buffer) or some kind of faulty hardware (faulty memory modules, unstable I/O layer). In those cases it can happen that InnoDB will crash. Crashes can also be triggered by InnoDB bugs. It’s not common but it happens from time to time. No matter the reason of a crash, the error log usually looks in a similar way. Let’s go over a live example.

2015-10-13 15:08:06 7f53f0658700  InnoDB: Assertion failure in thread 139998492198656 in file btr0pcur.cc line 447
InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == buf_block_get_page_no(btr_pcur_get_block(cursor))

InnoDB code is full of sanity checks - assertions are very frequent and, if one of them fails, InnoDB intentionally crashes MySQL. At the beginning you’ll see information about exact location of the assertion which failed. This gives you information about what exactly may have happened - MySQL source code is available on the internet and it’s pretty easy to download code related to your particular version. It may sounds scary but it’s not - MySQL code is properly documented and it’s not a problem to figure out at least what kind of activity triggers your issue.

InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

InnoDB logging tries to be as helpful as possible - as you can see, there’s information about InnoDB recovery process. Under some circumstances it may allow you to start MySQL and dump the InnoDB data. It’s used as “almost a last resort” - one step further and you’ll end up digging in InnoDB files using hex editor.

13:08:06 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=36
max_threads=514
thread_count=27
connection_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 269997 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

In the next step you’ll see some of the status counters printed. It’s supposed to give you some idea about configuration - maybe you configured MySQL to use too much resources? Once you go over this data, we’ll reach the backtrace. This piece of information will guide you through the stack and show you what kind of calls were executed when crash happened. Let’s check what exactly happened in this particular case.

Thread pointer: 0xa985960
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f53f0657d40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8cd37c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x6555a1]
/lib64/libpthread.so.0(+0xf710)[0x7f540f978710]
/lib64/libc.so.6(gsignal+0x35)[0x7f540dfd4625]
/lib64/libc.so.6(abort+0x175)[0x7f540dfd5e05]
/usr/sbin/mysqld[0xa0277b]
/usr/sbin/mysqld[0x567f88]
/usr/sbin/mysqld[0x99afde]
/usr/sbin/mysqld[0x8eb926]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x5d)[0x59a6ed]
/usr/sbin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x20)[0x800cd0]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xc88)[0x8128c8]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1bc5)[0x6d6d15]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5a8)[0x6dc228]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x108c)[0x6dda4c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x162)[0x6ab002]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6ab0f0]
/usr/sbin/mysqld(pfs_spawn_thread+0x143)[0xaddd73]
/lib64/libpthread.so.0(+0x79d1)[0x7f540f9709d1]
/lib64/libc.so.6(clone+0x6d)[0x7f540e08a8fd]

We can see that problem was triggered when MySQL was executing a DELETE command. Records were read sequentially and full table scan was performed (handler read_rnd_next). Combined with information from the beginning:

InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == buf_block_get_page_no(btr_pcur_get_block(cursor))

we can tell the crash happened when InnoDB was performing a scan of the buffer pool (assertion was triggered in btr_pcur_move_to_next_page method).

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (a991d40): DELETE FROM cmon_stats WHERE cid = 2 AND ts < 1444482484
Connection ID (thread ID): 5
Status: NOT_KILLED

Finally, we can see (sometimes, it’s not always printed) which thread triggered the assertion. In our case we can confirm that indeed it was a DELETE query which caused some kind of problem.

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
151013 15:08:06 mysqld_safe Number of processes running now: 0

As we said, usually MySQL is pretty verbose when it comes to the error log. Sometimes, though, things go so wrong that MySQL can’t really collect much of the information. It still gives a pretty clear indication that something went very wrong (signal 11). Please take a look at the following real life example:

22:31:40 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=10
max_threads=202
thread_count=9
connection_count=7
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 113388 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x1fb7050
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Segmentation fault (core dumped)
150119 23:48:08 mysqld_safe Number of processes running now: 0

As you can see, we have clear information that MySQL crashed. There’s no additional information, though. Segmentation faults can also be logged in diagnostic messages. In this particular case dmesg returned:

[370591.200799] mysqld[5462]: segfault at 51 ip 00007fe624e3e688 sp 00007fe4f4099c30 error 4 in libgcc_s.so.1[7fe624e2f000+16000]

It’s not much, as you can see, but such information can prove to be very useful if you decide to file a bug against the software provider, be it Oracle, MariaDB, Percona or Codership. It can also be helpful in finding relevant bugs already reported.

It’s not always possible to fix the problem based only on the information written in the error log. It’s definitely a good place to start debugging problems in your database, though. If you are hitting some kind of very common, simple misconfiguration, information you’d find in the error log should be enough to pinpoint a cause and find a solution. If we are talking about serious MySQL crashes, things are definitely different - most likely you won’t be able to fix the issue on your own (unless you are a developer familiar with MySQL code, that is) On the other hand, this data may be more than enough to identify the culprit and locate existing bug reports which cover it. Such bug reports contain lots of discussion and tips or workarounds - it’s likely that you’ll be able to implement one of them. Even information that the given bug was fixed in version x.y.z is useful - you’ll be able to verify it and, if it does solve your problem, to plan for upgrade. In the worst case scenario, you should have enough data to create a bug report on your own.

In the next post in the series, we are going to take a guided tour through the error log (and other logs) you may find in your Galera cluster. We’ll cover what kind of issues you may encounter and how to solve them.

Blog category:

DB Ops

Tags:

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Log Buffer #447: A Carnival of the Vanities for DBAs

November 2, 2015, 7:34 am

≫ Next: Should we be muddying the relational waters? Use cases for MySQL & Mongodb

≪ Previous: Become a MySQL DBA blog series - Understanding the MySQL Error Log

This Log Buffer Edition covers the weekly blog posts of Oracle, SQL Server and MySQL.

Oracle:

An Index or Disaster, You Choose (It’s The End Of The World As We Know It).
SQL Monitoring in Oracle Database 12c.
RMAN Full Backup vs. Level 0 Incremental.
Auto optimizer stats after CTAS or direct loads in #Oracle 12c.
How to move OEM12c management agent to new location.

SQL Server:

Automate SQL Server Log Monitoring.
10 New Features Worth Exploring in SQL Server 2016.
The ABCs of Database Creation.
Top 10 Most Common Database Scripts.
In-Memory OLTP Table Checkpoint Processes Performance Comparison.

MySQL:

The Client Library, Part 1: The API, the Whole API and Nothing but the API.
Performance of Connection Routing plugin in MySQL Router 2.0.
MariaDB 10.0.22 now available.
Semi-synchronous Replication Performance in MySQL 5.7.
MySQL and Trojan.Chikdos.A.

Learn more about Pythian’s expertise in Oracle , SQL Server & MySQL.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Should we be muddying the relational waters? Use cases for MySQL & Mongodb

November 2, 2015, 9:55 am

≫ Next: A first look at RDS Aurora

≪ Previous: Log Buffer #447: A Carnival of the Vanities for DBAs

Many of you know I publish a newsletter monthly. One thing I love about it is that after almost a decade of writing it regularly, the list has grown considerably. And I’m always surprised at how many former colleagues are actually reading it. So that is a really gratifying thing. Thanks to those who are, … Continue reading Should we be muddying the relational waters? Use cases for MySQL & Mongodb →

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

A first look at RDS Aurora

November 2, 2015, 12:23 pm

≫ Next: Upgrading Directly From MySQL 5.0 to 5.7 With mysqldump

≪ Previous: Should we be muddying the relational waters? Use cases for MySQL & Mongodb

Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.

First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.

Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.

The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.

That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.

Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.

So, how does Aurora compares to a technology like Galera?

Pros:

Higher write capacity, writer is unaffected by the other nodes
Simpler logic, no need for certification
No need for an SST to provision a new node
Can’t diverge
Scale iops tremendously
Fast failover
No need for quorum (handled by the object store)
Simple to deploy

Cons:

Likely asynchronous at the storage level
Only one node is writable
Not open source

Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?

The post A first look at RDS Aurora appeared first on MySQL Performance Blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Upgrading Directly From MySQL 5.0 to 5.7 With mysqldump

November 2, 2015, 12:58 pm

≫ Next: The world is not in your books and maps.

≪ Previous: A first look at RDS Aurora

Upgrading MySQL

NOTE: This blog is an updated version of the previously published blog, Upgrading Directly From MySQL 5.0 to 5.6 With mysqldump, modified for upgrading to 5.7.

Upgrading MySQL is a task that is almost inevitable if you have been managing a MySQL installation for any length of time.…

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The world is not in your books and maps.

November 2, 2015, 1:16 pm

≫ Next: MMUG14: MySQL Automation at Facebook

≪ Previous: Upgrading Directly From MySQL 5.0 to 5.7 With mysqldump

MySQL 5.7 came out with support for JSON, improved geometry, and virtual columns. Here's an example showing them all playing together.

Download citylots.json.

It comes as one big object, so we'll break it up into separate lines:
grep "^{ \"type" citylots.json > properties.json

Connect to a 5.7 instance of MySQL.

CREATE TABLE citylots (id serial, j json, p geometry as (ST_GeomFromGeoJSON(j, 2))); LOAD DATA LOCAL INFILE 'properties.json' INTO TABLE citylots (j);

A few of the rows don't contain useful data:
DELETE FROM citylots WHERE j->'$.geometry.type' IS NULL;

In MySQL Workbench, do:
SELECT id, p FROM citylots;

Then click on Spatial View. It takes a couple of minutes for 200k rows, but there's a map of San Franscico.

The default projection, 'Robinson', is designed for showing the whole world at once and so is pretty distorted for this particular data set. Mercator or Equirectangular are better choices. Fortunately, Workbench repaints the data in just a few seconds.

If you selected some other fields, you can click on the map and see the relevant data for that particular geometry.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MMUG14: MySQL Automation at Facebook

November 2, 2015, 1:41 pm

≫ Next: LIKE injection

≪ Previous: The world is not in your books and maps.

English: Madrid MySQL Users Group will be holding their next meeting on Tuesday, 10th November at 19:30h at the offices of Tuenti in Madrid. David Fernández will be offering a presentation “MySQL Automation @ FB”. If you’re in Madrid and are interested please come along. We have not been able to give much advance notice so if you know of others who may be interested please forward on this information. Full details of the MeetUp can be found here at the Madrid MySQL Users Group page.

Español: El día 10 de noviembre a las 19:30 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de Tuenti en Madrid. David Fernández nos ofrecerá una presentación (en inglés) “MySQL Automation @ FB”. Si estás en Madrid y interesado nos gustaría verte. No hemos podido avisar con mucha antelación, así que si conoces a otros que podrían estar interesados agradeceríamos les hagas llegar esta información. Se puede encontrar los detalles completos de la reunión aquí en la página de Madrid MySQL Users Group.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

LIKE injection

November 2, 2015, 4:00 pm

≫ Next: The Perfect Server - Ubuntu 15.10 (Wily Werewolf) with Apache, PHP, MySQL, PureFTPD, BIND, Postfix, Dovecot and ISPConfig 3

≪ Previous: MMUG14: MySQL Automation at Facebook

Looking through our exception tracker the other day, I ran across a notice from our slow-query logger that caught my eye. I saw a SELECT … WHERE … LIKE query with lots of percent signs in the LIKE clause. It was pretty obvious that this term was user-provided and my first thought was SQL injection.

[3.92 sec] SELECT ... WHERE (profiles.email LIKE '%64%68%6f%6d%65%73@%67%6d%61%69%6c.%63%6f%6d%') LIMIT 10

Looking at the code, it turned out that we were using a user-provided term directly in the LIKE clause without any checks for metacharacters that are interpreted in this context (%, _, \).

def self.search(term, options = {})
  limit = (options[:limit] || 30).to_i
  friends = options[:friends] || []
  with_orgs = options[:with_orgs].nil? ? false : options[:with_orgs]

  if term.to_s.index("@")
    users = User.includes(:profile)
                .where("profiles.email LIKE ?", "#{term}%")
                .limit(limit).to_a
  else
    users = user_query(term, friends: friends, limit: limit)
  end

  ...
end

While this isn't full-blown SQL injection, it got me thinking about the impact of this kind of injection. This kind of pathological query clearly has some performance impact because we logged a slow query. The question is how much?

I asked our database experts and was told that it depends on where the wildcard is in the query. With a % in the middle of a query, the database can still check the index for the beginning characters of the term. With a % at the start of the query, indices may not get used at all. This bit of insight led me to run several queries with varied % placement against a test database.

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github.com";
1 row in set (0.00 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%ris@github.com";
1 row in set (0.91 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github%";
1 row in set (0.00 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%c%h%r%i%s%@%g%i%t%h%u%b%.%c%o%m%";
21 rows in set (0.93 sec)

It seems that unsanitized user-provided LIKE clauses do have a potential performance impact, but how do we address this in a Ruby on Rails application? Searching the web, I couldn't find any great suggestions. There are no Rails helpers for escaping LIKE metacharacters, so we wrote some.

module LikeQuery
  # Characters that have special meaning inside the `LIKE` clause of a query.
  #
  # `%` is a wildcard representing multiple characters.
  # `_` is a wildcard representing one character.
  # `\` is used to escape other metacharacters.
  LIKE_METACHARACTER_REGEX = /([\\%_])/

  # What to replace `LIKE` metacharacters with. We want to prepend a literal
  # backslash to each metacharacter. Because String#gsub does its own round of
  # interpolation on its second argument, we have to double escape backslashes
  # in this String.
  LIKE_METACHARACTER_ESCAPE = '\\\\\1'

  # Public: Escape characters that have special meaning within the `LIKE` clause
  # of a SQL query.
  #
  # value - The String value to be escaped.
  #
  # Returns a String.
  def like_sanitize(value)
    raise ArgumentError unless value.respond_to?(:gsub)
    value.gsub(LIKE_METACHARACTER_REGEX, LIKE_METACHARACTER_ESCAPE)
  end

  extend self

  module ActiveRecordHelper
    # Query for values with the specified prefix.
    #
    # column - The column to query.
    # prefix - The value prefix to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_prefix(column, prefix)
      where("#{column} LIKE ?", "#{LikeQuery.like_sanitize(prefix)}%")
    end

    # Query for values with the specified suffix.
    #
    # column - The column to query.
    # suffix - The value suffix to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_suffix(column, suffix)
      where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(suffix)}")
    end

    # Query for values with the specified substring.
    #
    # column    - The column to query.
    # substring - The value substring to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_substring(column, substring)
      where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(substring)}%")
    end
  end
end

ActiveRecord::Base.extend LikeQuery
ActiveRecord::Base.extend LikeQuery::ActiveRecordHelper

We then went through and audited all of our LIKE queries, fixing eleven such cases. The risk of these queries turned out to be relatively low. A user could subvert the intention of the query, though not in any meaningful way. For us, this was simply a Denial of Service (DoS) vector. It's nothing revolutionary and it is not a new vulnerability class, but it's something to keep an eye out for. Three second queries can be a significant performance hit and application-level DoS vulnerabilities need to be mitigated.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

The Perfect Server - Ubuntu 15.10 (Wily Werewolf) with Apache, PHP, MySQL, PureFTPD, BIND, Postfix, Dovecot and ISPConfig 3

November 3, 2015, 3:03 am

≫ Next: Universal Code Completion using ANTLR

≪ Previous: LIKE injection

This tutorial shows how to install an Ubuntu 15.10 (Wily Werewolf) server (with Apache2, BIND, Dovecot) for the installation of ISPConfig 3, and how to install ISPConfig 3. ISPConfig 3 is a webhosting control panel that allows you to configure the following services through a web browser: Apache or nginx web server, Postfix mail server, Courier or Dovecot IMAP/POP3 server, MySQL, BIND or MyDNS nameserver, PureFTPd, SpamAssassin, ClamAV, and many more. This setup covers the installation of Apache (instead of nginx), BIND (instead of MyDNS), and Dovecot (instead of Courier).
PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Universal Code Completion using ANTLR

November 3, 2015, 6:13 am

≫ Next: I’m really quite good with maps

≪ Previous: The Perfect Server - Ubuntu 15.10 (Wily Werewolf) with Apache, PHP, MySQL, PureFTPD, BIND, Postfix, Dovecot and ISPConfig 3

While reworking our initial code completion implementation in MySQL Workbench I developed an approach that can potentially be applied for many different situations/languages where you need code completion. The current implementation is made for the needs of MySQL Workbench, but with some small refactorings you can move out the MySQL specific parts and have a clean core implementation that you can easily customize to your needs.

Since this implementation is not only bound to MySQL Workbench I posted the full description on my private blog.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

I’m really quite good with maps

November 3, 2015, 11:45 am

≫ Next: MySQL for Visual Studio 1.2.5 has been released

≪ Previous: Universal Code Completion using ANTLR

Workbench announced support for a spatial view in 6.2, but examples are somewhat lacking. Just how do you get a SHP into MySQL?

Download and unpack a SHP file such as these country boundaries.

In the Workbench installation directory, you'll find a program "ogr2ogr" that can convert .shp to .csv. Run it like this:

"C:\Program Files\MySQL\MySQL Workbench 6.3\ogr2ogr.exe" -f CSV countries.csv countries.shp -lco GEOMETRY=AS_WKT

Now create a table and load the CSV.

CREATE TABLE worldmap (
	OBJECTID smallint unsigned,
	NAME varchar(50),
	ISO3 char(3),
	ISO2 char(2),
	FIPS varchar(5),
	COUNTRY varchar(50),
	ENGLISH varchar(50),
	FRENCH varchar(50),
	SPANISH varchar(50),
	LOCAL varchar(50),
	FAO varchar(50),
	WAS_ISO varchar(3),
	SOVEREIGN varchar(50),
	CONTINENT varchar(15),
	UNREG1 varchar(30),
	UNREG2 varchar(15),
	EU boolean,
	SQKM decimal(20,11),
	g geometry
);

LOAD DATA LOCAL INFILE 'countries.csv' INTO TABLE worldmap FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' IGNORE 1 LINES
(@WKT, OBJECTID, NAME, ISO3, ISO2, FIPS, COUNTRY, ENGLISH, FRENCH, SPANISH, LOCAL, FAO, WAS_ISO, SOVEREIGN, CONTINENT, UNREG1, UNREG2, EU, SQKM)
SET g = ST_GeomCollFromText(@WKT);

Now just select rows of interest in Workbench, click the Spatial View format button, and there's your world map.

You can run multiple selects (such as the citylot data from yesterday's post) to overlay on top of the world map.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

MySQL for Visual Studio 1.2.5 has been released

November 3, 2015, 12:57 pm

≫ Next: Application Performance Issue Is A Business Issue [Infographic]

≪ Previous: I’m really quite good with maps

The MySQL Windows Experience Team is proud to announce the release of MySQL for Visual Studio 1.2.5. This is a maintenance release for 1.2.x. It can be used for production environments.

MySQL for Visual Studio is a product that includes all of the Visual Studio integration functionality to create and manage MySQL databases when developing .NET applications.

MySQL for Visual Studio is installed using the MySQL Installer for Windows which comes in 2 versions:

Full (150 MB) which includes a complete set of MySQL products with their binaries included in the downloaded bundle.
Web (1.5 MB – a network install) which will just pull MySQL for Visual Studio over the web and install it when run.

You can download MySQL Installer from our official Downloads page at http://dev.mysql.com/downloads/installer/.

MySQL for Visual Studio can also be downloaded by using the product standalone installer found at http://dev.mysql.com/downloads/windows/visualstudio/.

Changes in MySQL for Visual Studio 1.2.5 (2015-10-29)

This section documents all changes and bug fixes applied to MySQL for Visual Studio since the release of 1.2.5. Several new features were added to the 1.2.x branch, for more information see the section below called: What Is New In MySQL for Visual Studio 1.2.

Known Limitations:

Item templates do not work correctly with MySQL Server 5.7.x, as it prevents the creation of an Entity Framework model.

Functionality Added or Changed

Added the Entity Framework option to the MySQL Website Configuration dialog for web projects, so Entity Framework version 5 or 6 can be used with a MySQL database provider. These automatically add the configuration/references needed to the web.config file and the project itself. Also, all available configuration options are now listed in the dialog.
Project Templates were replaced with Project Items. The Project Templates option was removed from the plugin toolbar, and from the Project menu, in order to add the Project Items feature with two options: MySQL New MVC Item and MySQL New Windows Form, which are available on the Add New Item dialog when adding a new Project Item. They add new windows forms or MVC controllers/views connected to MySQL.

Bugs Fixed

The Installer could not uninstall MySQL for Visual Studio if Visual Studio was uninstalled first.
- Bug #21953055, Bug #71226
In v1.2.4, the Launch Workbench and Open MySQL Utilities Console toolbar buttons were disabled.
- Bug #21495692
The Templates installer feature could not be uninstalled via Add/Remove Programs. Because Project Templates were replaced by Project Items, this is no longer a concern.
- Bug #21488922, Bug #77802
The dataset designer wizard was not showing the stored procedure parameters when creating a “TableAdapter” using existing stored procedures for the “Select” command. Also, the stored procedure command had an “error” thus causing the dataset to not be created.
- Bug #20233133, Bug #74195

What Is New In MySQL for Visual Studio 1.2

New MySQL Project Items for creating data views in Windows Forms and ASP.NET MVC web applications.
A new option in web configuration tool for the ASP.NET Personalization Provider (this feature requires MySQL Connector/NET 6.9 or newer).
A new option in web configuration tool for the ASP.NET Site Map Provider (this feature requires MySQL Connector/NET 6.9 or newer).
A new option for the MySQLSimpleMembership provider in the web configuration tool. (This feature requires MySQL Connector/NET or newer).

MySQL Windows Forms Project Item

This Project Item is available on the Add New Item dialog in Visual Studio when adding a new item to an existing project.

The dialog presented to create the MySQL Windows Forms Project Item automates the generation of a Windows Form, representing a view for MySQL data available through an existing Entity Framework’s model entity containing a MySQL table or view. Different view types are available to present the data:

Single-column: A form that contains one control by each existing column in the table with navigation controls and that allows CRUD operations.All controls can include validations for numeric and DateTime data types.
Grid: A form with a data grid view that contains navigation controls.
Master-detail: A form with a single control layout for the Parent table and a data grid view to navigate through child table’s data.

Supported with C# or Visual Basic language. This feature requires Connector/NET 6.7.5, 6.8.3 or 6.9.x.

For more details on the features included check the documentation at: https://dev.mysql.com/doc/connector-net/en/visual-studio-project-items-forms.html

MySQL ASP.NET MVC Project Item

This Project Item is available on the Add New Item dialog in Visual Studio when adding a new item to an existing project.

The dialog presented to create the MySQL ASP.NET MVC Item automates the generation of a controller and its corresponding view, representing a view for MySQL data available through an existing Entity Framework’s model entity containing a MySQL table or view. The MVC versions supported by this wizard are 3 when using Visual Studio 2010 or 2012, and 4 when using Visual Studio 2013 or greater.

The generation of the MVC items is done by creating an Entity Framework data model either with Entity Framework version 5 or 6 depending on the user’s selection.

Supported with C# or Visual Basic language. This feature requires Connector/NET 6.7.5, 6.8.3 or 6.9.x.

For more details on the features included check the documentation at: https://dev.mysql.com/doc/connector-net/en/idm139719963401984.html

New option in web configuration tool for the ASP.NET Personalization Provider

Personalization provider allows to store personalization state-state data regarding the content and layout of Web Parts pages-generated by the Web Parts personalization service using MySQL as a data source. This feature requires Connector/NET 6.9.x or greater.

http://dev.mysql.com/doc/connector-net/en/connector-net-website-config.html

New option in web configuration tool for the ASP.NET Site Map Provider

Site Map provider allows to show a hierarchical list of links that describe the structure of a site. This feature requires Connector/NET 6.9.x or greater.

http://dev.mysql.com/doc/connector-net/en/connector-net-website-config.html

New option in web configuration tool for the ASP.NET Simple Membership provider

The latest provider added to handle web site membership tasks with ASP.NET. This feature requires Connector/Net 6.9.x or greater.

http://dev.mysql.com/doc/connector-net/en/connector-net-simple-membership-tutorial.html

Quick links:

MySQL for Visual Studio documentation: http://dev.mysql.com/doc/connector-net/en/connector-net-visual-studio.html
Inside MySQL blog (NEW blog home): http://insidemysql.com/
MySQL on Windows blog (OLD blog home): http://blogs.oracle.com/MySQLOnWindows
MySQL for Visual Studio forum: http://forums.mysql.com/list.php?174
MySQL YouTube channel: http://www.youtube.com/user/MySQLChannel
MySQL Bugs database: http://bugs.mysql.com

Enjoy and thanks for the support!

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Application Performance Issue Is A Business Issue [Infographic]

November 4, 2015, 2:28 am

≫ Next: s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

≪ Previous: MySQL for Visual Studio 1.2.5 has been released

Application issues, downtimes and outages are the responsibility of IT to resolve. However, at the end of the day they are, essentially, business issues.

Businesses compete for customers online everyday and the latter are not a patient bunch. Every second of your web application performance counts building up or taking away both revenue and brand reputation. There’s always a risk that even though your application may appear to be working fine, its flow’s key parts, such as shopping carts, registration pages, etc. may not be functioning properly. So it’s critical to have a full controll across the entire application delivery chain. With Node.js, MySQL, Oracle, Java/JMX, Log, and Tomcat application monitors, you can track your app’s performance bottom-up.

Check out the infographic below to see for yourself what it takes to have a sloppy application and how to take measures not to let your app issues grow into business issues.

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

November 4, 2015, 3:47 am

≫ Next: Setting-up second mysql instance & replication on Linux in 10 steps

≪ Previous: Application Performance Issue Is A Business Issue [Infographic]

Check Out Our Latest Technical Resources for MySQL, MariaDB, PostgreSQL and MongoDB

This blog is packed with all the latest resources and tools we’ve recently published! Please do check it out and let us know if you have any comments or feedback.

Product Announcements & Resources

ClusterControl 1.2.11 Release
Last month, we were pleased to announce our best release yet for Postgres users as well as to introduce key new features for our MySQL/MariaDB users, such as support for MaxScale, an open-source, database-centric proxy that works with MySQL, MariaDB and Percona server.

View all the related resources here:

Support for MariaDB’s MaxScale
You can get started straight away with MaxScale by deploying and managing it with ClusterControl 1.2.11. This How-To blog shows you how, step-by-step.

Customer Case Studies

From small businesses to Fortune 500 companies, customers have chosen Severalnines to deploy and manage MySQL, MongoDB and PostgreSQL.

Computing: Monitoring high availability clusters: Why Ping Identity chose Severalnines over SolarWinds
Information Age: How Open Source MySQL keeps Eurovision Live
Press Release: Severalnines helps BT Expedite global expansion for its retail customers

View our Customer page to discover companies like yours who have found success with ClusterControl.

Screen Shot 2015-11-04 at 09.44.52.png

Technical Webinar - Replays

As you know, we run a monthly technical webinar cycle; these are the latest replays, which you can watch at your own leisure, all part of our ‘Become a MySQL DBA’ series:

View all our replays here

ClusterControl Blogs

We’ve started a new series of blogs focussing on how to use ClusterControl. Do check them out!

View all ClusterControl blogs here.

The MySQL DBA Blog Series

We’re on the 15th installment of our popular ‘Become a MySQL DBA’ series and you can view all of these blogs here. Here are the latest ones in the series:

View all the ‘Become a MySQL DBA’ blogs here

We trust these resources are useful. If you have any questions on them or on related topics, please do contact us!

Blog category:

DB Ops

Tags:

PlanetMySQL Voting: Vote UP / Vote DOWN

↧

Setting-up second mysql instance & replication on Linux in 10 steps

November 4, 2015, 3:59 am

≫ Next: MySQL Group Replication now on more platforms

≪ Previous: s9s Tools and Resources: 'Become a MySQL DBA' series, ClusterControl 1.2.11 release, and more!

This is a quick setup guide of 10-steps to install and configure (multiple) MySQL instance on 3307 port, and make it slave from MySQL running on port 3306.
PlanetMySQL Voting: Vote UP / Vote DOWN

↧