Quantcast
Channel: Planet MySQL
Viewing all 18800 articles
Browse latest View live

Setting up an NDB Cluster in the Oracle Cloud using Auto Installer

$
0
0
In MySQL Cluster 8.0.18 we have developed MySQL Cluster Auto Installer to
also support installing NDB :)

We have made it very easy to setup an NDB Cluster in the Oracle Cloud.
The Auto Installer will take care of installing the proper software, installing
firewalls, installing some supportive software. Most of the testing of this
software have been done against Oracle Cloud using instances with Oracle
Linux 7.

I prepared two Youtube videos to show how it works. The first one gives some
insights into setting up the Compute Instances required in the Oracle Cloud.

Setup compute instances in Oracle Cloud for MySQL Cluster AutoInstaller

The second video uses these compute instances to set up an NDB Cluster.

Setting up an NDB Cluster in the Oracle Cloud using Auto Installer

MySQL Workbench Review

$
0
0
MySQL Workbench Review

MySQL Workbench ReviewMySQL Workbench is a great multi-purpose GUI tool for MySQL, which I think is not marketed enough by the MySQL team and is not appreciated enough by the community for what it can do.

MySQL Workbench Licensing

MySQL Workbench is similar to MySQL Server and is an Open-Core product. There is Community Edition which has GPL licensed source code on GitHub as well as the “MySQL Workbench Standard Edition (SE)” and “MySQL Workbench Enterprise Edition (EE)”  which are proprietary. The differences between the releases can be found in this document.

In this MySQL Workbench review, I focus on the MySQL Workbench Community Edition, often referred to as MySQL Workbench CE.

Downloading MySQL Workbench

You can download the current version of MySQL Workbench here.

Installing MySQL Workbench

Installation, of course, will be OS-dependent. I installed MySQL Workbench CE on Windows and it was quite uneventful.

Installing MySQL Workbench

 

Starting MySQL Workbench for the first time

If you go through the default MySQL Workbench install process, it will be started upon install completion.  And as it starts, it will check for MySQL Servers running locally, and because I do not have anything running locally, it won’t detect any servers.

Starting MySQL Workbench

 

You need to click on the little “+” sign near the “MySQL Connection” text to add a connection.   I think a clearer link to “Add Connection” by “Rescan Servers” would be more helpful.

Connection options are great. Besides support for TCP/IP and Local Socket/Pipe, MySQL Workbench also has support for TCP/IP over SSH, which is fantastic if you want to connect to servers reachable via SSH but do not have MySQL port open. 

When you have the connection created, you can open the main screen of MySQL Workbench which looks like this:

main screen of MySQL Workbench

You can see there is a lot of stuff there!  Let’s look at some specific features.

MySQL Workbench Management Features

MySQL Workbench Management Features 

Server Status shows information about the running MySQL Server.  Of course, being an Oracle product, it is not designed to detect the alternative implementations.   In this case, Percona Server has a Thread Pool feature but it shows as N/A.  

Server Performance information graphs are updated in real-time and provide some idea about server load.

Client Connections shows current connections to MySQL Server.   This view has some nice features, for example, you can hide sleeping connections and look at running queries only, you can set the view to automatically refresh, and kill some queries and connections. You can also run EXPLAIN for Connection to see the query execution plan.   

Client Connections MySQL Workbench

How EXPLAIN for Connection works is a bit complicated.  As you click on EXPLAIN for Connection, the notebook containing the query opens up, but I would expect to see the explain output at this point:

EXPLAIN for Connection

You when need to click on the EXPLAIN icon to see the Query Explain Output:

Query Explain Output

Note you can get both EXPLAIN for the given query or EXPLAIN for CONNECTION, which can be different, especially in the case of this particular query, where the execution plan was abnormal.

There are multiple displays for EXPLAIN provided, including Tabular Explain and Raw JSON explain, if you want to see these, although having Visual Explain displayed is a unique MySQL Workbench feature.

I also like the feature of MySQL Workbench to provide additional details on connection, such as held locks as well as connection attributes, which can often help to find what particular application instance this query comes from.

Users and Privileges

This MySQL Workbench functionality allows you to view and manage your users:

MySQL Workbench Users

It is not very advanced, but instead for the basic needs of understanding user privileges.  It has built-in support for Administrative Roles but there does not seem to be support for generic roles or some newer features such as locking accounts or requiring a change in password after a certain period of time, etc.

Status and System Variables

The Status and System Variables section in MySQL Workbench shows the formatted output of “SHOW GLOBAL STATUS” and “SHOW VARIABLES”:

Status and System Variables

I like the fact that the massive number of settings and variables are grouped into different categories and there is some help provided.  The fact that all values are only provided as raw numbers, without any formatting and not normalized per second when appropriate, make it hard to work with such information.

Data Export and Data Import/Restore

As you may expect, these provide the functionality to export and import schema and possibly data.  This basically provides GUI for mysqldump, which is a great help for more advanced use cases.

Data Export

Instance Management

This is interesting; even though I set up a connection using SSH, MySQL Workbench does not automatically use it for host access. It needs to be configured separately instead, by clicking the little Wrench icon.

Instance Management

If you’re using Linux for Remote Management, you will need to provide quite a lot of details about the Linux version, packaging type, and even init scripts you use, which can easily be overwhelming.

I do wonder why there is no auto-detection of the system type implemented here.

If you configure Remote Management in MySQL Workbench properly, you could, in theory, be able to start/stop the server, look at server logs, and view options file.   It did not work well in my case. 

Remote Management in MySQL Workbench

Performance  – Dashboard

The MySQL Workbench Performance Dashboard section shows a selection of Performance Graphs.  It is not very deep and only shows stats while MySQL Workbench is running, but it covers some good basics.

MySQL Workbench Performance Dashboard

Performance – Reports

The Performance Reports section in MySQL Workbench is pretty cool; it shows a lot of reports based on MySQL’s sys schema. 

Performance - Reports 

This is pretty handy, but I think it would benefit from having better formatting (so I do not have to count digits to see how much memory is used) and also numbers from the instance start often make little sense.

Performance Schema Setup

This is one of the hidden gems in MySQL Workbench.  Performance Schema configuration in MySQL can be rather complicated if you’re not familiar with it, and MySQL Workbench makes it a lot easier.   Its default Performance Schema controls are very basic.

Performance Schema Setup

However, if you enable the “Show Advanced” settings, it will give you this fantastic overview of Performance Schema: 

As well as allow you to modify the configuration in details:

modify configuration

Until this point, we have been operating in Administration View.  If you want to work with Database Schema, you want to switch MySQL Workbench to Schema View.

Schema View

This view allows you to work with tables and other database schema objects.  The contextual menu provides different functions for different objects.

contextual menu

 

MySQL Workbench Query Editor

Finally, let’s take a look at the MySQL Workbench Query Editor. It has a lot of advanced features.   

First, I like that it is a multi-tab editor so you can have multiple Queries opened at once and you can switch between them easily.  It also has support for helpful snippets – both a large library of built-in ones as well as ones created by the user. It also has support for contextual help with can be quite helpful for beginners.

I like the fact MySQL Workbench adds LIMIT 1000 by default to queries it runs, and it also allows you to easily and conveniently edit the stored data.

Examine Field Types:

View Query execution statistics:

Query execution statistics

Though in this case, it seems to only show information derived from SHOW SESSION STATUS and not more advanced details available in Performance Schema. 

Visual Explain is quite a gem of MySQL Workbench too, but we covered it already.

Summary

In general, I’m quite impressed with the functionality offered with MySQL Workbench CE (Community Edition). For someone looking for a simple, free GUI for MySQL to run queries and provide basic help with administration you need to look no further. If you have more advanced needs, particularly in the monitoring or management space, you should look somewhere else. Oracle has MySQL Enterprise Monitor for this purpose which is a fully commercial product that comes with a MySQL Enterprise subscription.  If you are looking for an Open Source Database Monitoring-focused product, consider Percona Monitoring and Management.  

ProxySQL 2.0.7 and proxysql-admin Tool Now Available

$
0
0
ProxySQL

ProxySQLProxySQL 2.0.7, released by ProxySQL, is now available for download in the Percona Repository along with Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

This release includes ProxySQL 2.0.7 which introduces many new features and enhancements and also fixes a number of bugs.

The proxysql-admin tool now supports MariaDB 10.4.

New Features

Improvements

  • PSQLADM-195: A new option --with-stats-reset has been added to the proxysql-status script to display the  *_reset tables from the stats database. If this option is not specified, these tables are not displayed by default.

Bugs Fixed

  • PSQLADM-157: In some cases, the proxysql-status script used the cat command to display a file without checking if the file existed and was readable.
  • PSQLADM-181: When run with --update-cluster --write-node=<node_name>, the proxysql-admin tool now verifies that the writer nodes are not read-only.

The ProxySQL 2.0.7 source and binary packages available from the Percona download page for ProxySQL include ProxySQL Admin – a tool developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 2.0.7 are available as well. You can download the original ProxySQL from GitHub. GitHub hosts the documentation in the wiki format.

ProxySQL is available under Open Source license GPLv3.

Managing Big Data with MySQL Can be Challenging

$
0
0

Author: Robert Agar

MySQL is an extremely popular open-source database platform originally developed by Oracle. It currently is the second most popular database management system in the world, only trailing Oracle’s proprietary offering. If you aim to be a professional database administrator, knowledge of MySQL is almost a prerequisite. It is an important part of the multi-platform database environment found in the majority of IT departments.

A recent addition that has added to the complexity of managing a MySQL environment is the introduction of big data. Big data is characterized by the volume, velocity, and variety of information that is gathered and which needs to be processed. It can be used to provide an organization with the business intelligence (BI) it needs to gain a competitive advantage and better understanding of its customers.

The size of big data sets and its diversity of data formats can pose challenges to effectively using the information. These characteristics are what make big data useful in the first place. It is the convergence of large amounts of data from diverse sources that provide additional insight into business processes that are not apparent through traditional data processing.

Some examples of how big data can be beneficial to a business are:

  • Studying customer engagement as it relates to how a company’s products and services compare with its competitors;
  • Marketing analysis to fine-tune promotions for new offerings;
  • Analyzing customer satisfaction to identify areas in service delivery that can be improved;
  • Listening on social media to uncover trends and activity around specific sources that can be used to identify potential target audiences.

MySQL Limitations When Handling Big Data

MySQL was not designed with big data in mind. This does not mean that it cannot be used to process big data sets, but some factors must be considered when using MySQL databases in this way. Here are some MySQL limitations to keep in mind.

  • The lack of a memory-centered search engine can result in high overhead and performance bottlenecks.
  • Handling large data volumes requires techniques such as shading and splitting data over multiple nodes to get around the single-node architecture of MySQL.
  • Processing volatile data can pose a problem in MySQL. This issue can be somewhat alleviated by proper data design.
  • The analytical capabilities of MySQL are stressed by the complicated queries necessary to draw value from big data resources.

These limitations require that additional emphasis be put on monitoring and optimizing the MySQL databases that are used to process and organization’s big data assets. It can be the difference in your ability to produce value from big data.

Optimizing the Performance of  Your MySQL Databases

Managing a MySQL environment that is used, at least in part, to process big data demands a focus on optimizing the performance of each instance. SQL Diagnostic Manager for MySQL offers a dedicated tool for MySQL monitoring that will help identify potential problems and allow you to take corrective action before your systems are negatively impacted. The tool helps teams cope with some of the limitations presented by MySQL when processing big data.

Some specific features of SQL Diagnostic Manager for MySQL that will assist with handling big data are:

  • Real-time query monitoring to find and resolve issues before they impact end-users;
  • Monitoring of long-running and locked queries that can result from the complexity of processing the volume of information in big data sets;
  • Creating custom dashboards and charts that focus on the particular aspects of your MySQL systems and help identify trends and patterns in system performance;
  • Employing over 600 built-in monitors that cover all areas of MySQL performance.

Neither big data nor MySQL is going away anytime soon. Getting them to play nicely together may require third-party tools and innovative techniques. SQL Diagnostic Manager for MySQL is one such tool that can be used to maintain the performance of your MySQL environment so it can help produce business value from big data.

The post Managing Big Data with MySQL Can be Challenging appeared first on Monyog Blog.

MySQL Document Store – a quick-guide to storing JSON documents in MySQL using JavaScript and Python (and even SQL!)

$
0
0

MySQL introduced a JSON data type in version 5.7, and expanded the functionality in version 8.0.

Besides being able to store native JSON in MySQL, you can also use MySQL as a document store (doc store) to store JSON documents. And, you can use NoSQL CRUD (create, read, update and delete) commands in either JavaScript or Python.

This post will show you some of the basic commands for using MySQL as a document store.

Requirements

To use MySQL as a document store, you will need to use the following server features:

The X Plugin enables the MySQL Server to communicate with clients using the X Protocol, which is a prerequisite for using MySQL as a document store. The X Plugin is enabled by default in MySQL Server as of MySQL 8.0. For instructions to verify X Plugin installation and to configure and monitor X Plugin, see Section 20.5, “X Plugin”.

The X Protocol supports both CRUD and SQL operations, authentication via SASL, allows streaming (pipelining) of commands and is extensible on the protocol and the message layer. Clients compatible with X Protocol include MySQL Shell and MySQL 8.0 Connectors.

Clients that communicate with a MySQL Server using X Protocol can use X DevAPI to develop applications. X DevAPI offers a modern programming interface with a simple yet powerful design which provides support for established industry standard concepts. This chapter explains how to get started using either the JavaScript or Python implementation of X DevAPI in MySQL Shell as a client. See X DevAPI for in-depth tutorials on using X DevAPI.
(source: https://dev.mysql.com/doc/refman/8.0/en/document-store.html)

And, you will need to have MySQL version 8.0.x (or higher) installed, as well as the MySQL Shell version 8.0.x (or higher). For these examples, I am using version 8.0.17 of both. (You could use version 5.7.x, but I have not tested any of these commands in 5.7.x)

Starting MySQL Shell (mysqlsh)

When starting MySQL Shell (Shell), you have two session options. The default option is mysqlx (‐‐mx), and this allows the session to connect using the X Protocol. The other option when starting Shell is ‐‐mysql, which establishes a “Classic Session” and connects using the standard MySQL protocol. For this post, I am using the default option of mysqlx (‐‐mx). There are other MySQL Shell command-line options for the X Protocol available. And, there are specific X Protocol variables which may need to be set for the mysqlx connection.

Here is a list of all of the MySQL Shell commands and their shortcuts (for MySQL version 8.0).
(source: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-commands.html)

Command Alias/Shortcut Description
\help \h or \? Print help about MySQL Shell, or search the online help.
\quit \q or \exit Exit MySQL Shell.
\ In SQL mode, begin multiple-line mode. Code is cached and executed when an empty line is entered.
\status \s Show the current MySQL Shell status.
\js Switch execution mode to JavaScript.
\py Switch execution mode to Python.
\sql Switch execution mode to SQL.
\connect \c Connect to a MySQL Server.
\reconnect Reconnect to the same MySQL Server.
\use \u Specify the schema to use.
\source \. Execute a script file using the active language.
\warnings \W Show any warnings generated by a statement.
\nowarnings \w Do not show any warnings generated by a statement.
\history View and edit command line history.
\rehash Manually update the autocomplete name cache.
\option Query and change MySQL Shell configuration options.
\show Run the specified report using the provided options and arguments.
\watch Run the specified report using the provided options and arguments, and refresh the results at regular intervals.
\edit \e Open a command in the default system editor then present it in MySQL Shell.
\system \! Run the specified operating system command and display the results in MySQL Shell.

To start the MySQL Shell (Shell), you simply execute the mysqlsh command from a terminal window. The default mode is JavaScript (as shown by the JS in the prompt).


$ mysqlsh
MySQL Shell 8.0.17-commercial

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.


 MySQL  JS   

By starting Shell without any variables, you will need to connect to a database instance. You do this with the \connect command, or you can use the \c shortcut. The syntax is \c user@ip_address:

 MySQL  JS    \c root@127.0.0.1
Creating a session to 'root@127.0.0.1'
Fetching schema names for autocompletion. . .  Press ^C to stop.
Your MySQL connection id is 16 (X protocol)
Server version: 8.0.17-commercial MySQL Enterprise Server – Commercial
No default schema selected; type \use to set one.
 MySQL  127.0.0.1:33060+ ssl  JS    

Or, to start Shell with the connection information, specify the user and host IP address. The syntax is mysqlsh user@ip_address:

$ mysqlsh root@127.0.0.1
MySQL Shell 8.0.17-commercial

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
Creating a session to 'root@127.0.0.1'
Fetching schema names for autocompletion. . .  Press ^C to stop.
Your MySQL connection id is 18 (X protocol)
Server version: 8.0.17-commercial MySQL Enterprise Server – Commercial
No default schema selected; type \use to set one.
 MySQL  127.0.0.1:33060+ ssl  JS   

You may find a list of all of the command-line options at https://dev.mysql.com/doc/mysql-shell/8.0/en/mysqlsh.html.

The Shell prompt displays the connection information, whether or not you are using ssl, and your current mode (there are three modes – JavaScript, Python and SQL). In the earlier example, you are in the (default) JavaScript mode. You can also get your session information with the session command:

 MySQL  127.0.0.1:33060+ ssl  JS    session
<Session:root@127.0.0.1:33060>

All of these commands are case-sensitive, so if you type an incorrect command, you will see the following error:

 MySQL  127.0.0.1:33060+ ssl  JS    Session
ReferenceError: Session is not defined

Here is how you switch between the three modes: – JavaScript, Python and SQL.

 MySQL  127.0.0.1:33060+ ssl  JS    \sql
Switching to SQL mode. . .  Commands end with ;
 MySQL  127.0.0.1:33060+ ssl  SQL    \py
Switching to Python mode. . . 
 MySQL  127.0.0.1:33060+ ssl  Py    \js
Switching to JavaScript mode. . . 
 MySQL  127.0.0.1:33060+ ssl  JS    

There are also several different output formats. You may change the output using the shell.options.set command. The default is table. Here are examples of each one, using the same command:

table output format

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','table')

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('select user, host from mysql.user')
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
| user             | host      |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
| mysql.infoschema | localhost |
| mysql.session    | localhost |
| mysql.sys        | localhost |
| root             | localhost |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
4 rows in set (0.0005 sec)

JSON output format

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','json')

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('select user, host from mysql.user limit 2')
{
    “user”: “mysql.infoschema”,
    “host”: “localhost”
}
{
    “user”: “mysql.session”,
    “host”: “localhost”
}
2 rows in set (0.0005 sec)

tabbed format

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','tabbed')

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('select user, host from mysql.user')
user     host
mysql.infoschema     localhost
mysql.session     localhost
mysql.sys     localhost
root     localhost
4 rows in set (0.0004 sec)

vertical format

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','vertical')

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('select user, host from mysql.user order by user desc')
*************************** 1. row ***************************
user: mysql.infoschema
host: localhost
*************************** 2. row ***************************
user: mysql.session
host: localhost
*************************** 3. row ***************************
user: mysql.sys
host: localhost
*************************** 4. row ***************************
user: root
host: localhost
4 rows in set (0.0005 sec)

MySQL Shell – Create & Drop Schema

Note: With the MySQL Doc Store, the terms to describe the database, table and rows are different. The database is called the schema (even thought the doc store is “schema-less”). The tables are called collections, and the rows of data are called documents.

To create a schema named test1, use the createSchema command:

 MySQL  127.0.0.1:33060+ ssl  JS    session.createSchema("test1")
<Schema:test1>

To get a list of the current schemas, use the getSchemas command:

 MySQL  127.0.0.1:33060+ ssl  JS    session.getSchemas()
[
<Schema:information_schema>,
<Schema:mysql>,
<Schema:performance_schema>,
<Schema:sys>,
<Schema:test1>
]

Also, you can run SQL commands inside of the Doc Store – something you can't natively do with most other NoSQL databases. Instead of using the getSchemas command, you can issue a SHOW DATABASES SQL command using the runSql NoSQL command.

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('show databases')
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| Database           |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test1              |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
5 rows in set (0.0052 sec)

To drop a schema named test1, you use the dropSchema command:


 MySQL  127.0.0.1:33060+ ssl  JS    session.dropSchema("test1")

Just like with MySQL, you have to select the schema (database) you want to use. This shows you how to create a schema and set it as your default schema. With the \use command, you are really setting a variable named db to equal the default schema. So, after you have set your schema with the \use command, when you issue the db command, you can see the default schema.

 MySQL  127.0.0.1:33060+ ssl  JS    session.createSchema("workshop")

 MySQL  127.0.0.1:33060+ ssl  JS    \use workshop
Default schema `workshop` accessible through db.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db
<Schema:workshop>

To change the value of the variable db, you may use the \use command or you may set the value of db using the var (variable) command. The command session.getSchema will return a schema value, but it does not automatically set the db variable value.)

 MySQL  127.0.0.1:33060+ ssl  JS    \use workshop
Default schema `workshop` accessible through db.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db
<Schema:workshop>

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    session.getSchema('mysql');
<Schema:mysql>

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db
<Schema:workshop>

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    var db = session.getSchema('mysql');

 MySQL  127.0.0.1:33060+ ssl  mysql  JS    db
<Schema:mysql>

You can also create your own variables. Here is an example of setting the variable sdb to equal the SQL command SHOW DATABASES:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    var sdb = session.runSql('show databases')

 MySQL  127.0.0.1:33060+ ssl  workshop  JS     sdb
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| Database           |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| workshop           |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
6 rows in set (0.0079 sec)

But the variables are only good for your current session. If you quit Shell, log back in, and re-run the variable, you will get an error stating the variable is not defined.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS     \q
Bye!

# mysqlsh root@127.0.0.1
. . . 

 MySQL  127.0.0.1:33060+ ssl  JS    sdb
ReferenceError: sdb is not defined

After you have created your schema (database), and selected it via \use command, then you can create a collection (table) and insert JSON documents into the collection. I will create a collection named test1.

 MySQL  127.0.0.1:33060+ ssl  JS    \use workshop

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.createCollection("test1")

You can get a list of collections by using the getCollections command, and you can also execute SQL commands with the session.runSql command. The getCollections command is the same as the SQL command SHOW TABLES.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.getCollections()
[
<Collection:test1>
]

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    session.runSql('SHOW TABLES')
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| Tables_in_workshop |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| test1              |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
1 row in set (0.0001 sec)

To drop a collection, use the dropCollection command. You can verify the collection was dropped by using the getCollections command again.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.dropCollection("test1")
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.getCollections()
[]

If you have not selected a default database, you may also specify the schema name prior to the collection name (which is the same type of syntax when you create a table inside of a database in MySQL). You will notice the output is different, as when you specify the schema name before the collection name, that schema name is returned as well.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.createCollection("foods")
<Collection:foods>

 MySQL  127.0.0.1:33060+ ssl  JS    db.createCollection("workshop.foods2")
<Collection:workshop.foods2>

To add a JSON document to a collection, you use the add command. You must specify a collection prior to using the add command. (You can't issue a command like workshop.foods.add.) You can add the document with or without the returns and extra spaces (or tabs). Here the JSON document to add to the collection named workshop:

{
    Name_First: "Fox",
    Name_Last: "Mulder",
    favorite_food: {
        Breakfast: "eggs and bacon",
        Lunch: "pulled pork sandwich",
        Dinner: "steak and baked potato"
   } 


Note: Some JSON formats require double quotes around the keys/strings. In this example, Name_First has double quotes in the first example, and the second example doesn’t have double quotes. Both formats will work in Shell.

The -> in the examples below is added by Shell. You don’t need to type this on the command line.


 

 MySQL  127.0.0.1:33060+ ssl  project  JS    db.foods.add({
                                          -> "Name_First": "Steve"})
                                          ->
Query OK, 1 item affected (0.0141 sec)
 MySQL  127.0.0.1:33060+ ssl  project  JS    db.foods.add({
                                          -> Name_First: "Steve"})
                                          ->
Query OK, 1 item affected (0.0048 sec)


Here are examples of each method of adding a JSON document – one where all of the data in the JSON document is on one line, and another where the JSON document contains data on multiple lines with returns and spaces included.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.add({Name_First: "Fox", Name_Last: "Mulder", favorite_food: {Breakfast: "eggs and bacon", Lunch: "pulled pork sandwich", Dinner: "steak and baked potato"}})
Query OK, 1 item affected (0.0007 sec)

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.add( {
                                          ->  Name_First: "Fox",
                                          ->  Name_Last: "Mulder",
                                          ->  favorite_food: {
                                          ->      Breakfast: "eggs and bacon",
                                          ->      Lunch: "pulled pork sandwich",
                                          ->      Dinner: "steak and baked potato"
                                          ->  } } )
                                          -> 
Query OK, 1 item affected (0.0005 sec)

So far, I have created a schema, and a collection, and I have added one document to the collection. Next, I will show you how to perform searches using the find command.

Searching Documents

To find all records in a collection, use the find command without specifying any search criteria:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find()
{
    "_id": "00005d6fc3dc000000000027e065",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": {
        "Lunch": "pulled pork sandwich",
        "Dinner": "steak and baked potato",
        "Breakfast": "eggs and bacon"
    }
}
1 document in set (0.0002 sec)


Note: The _id key in the output above is automatically added to each document and the value of _id cannot be changed. Also, an index is automatically created on the _id column. You can check the index using the SQL SHOW INDEX command. (To make the output easier to view, I switched the format to vertical and I switched it back to table)
 

 MySQL  127.0.0.1:33060+ ssl  workshop  JS     shell.options.set('resultFormat','vertical')
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    session.runSql('SHOW INDEX FROM workshop.foods')
*************************** 1. row ***************************
        Table: foods
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: _id
    Collation: A
  Cardinality: 1
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 row in set (0.0003 sec)
 MySQL  127.0.0.1:33060+ ssl  workshop  JS     shell.options.set('resultFormat','table')


 
Here is how you perform a search by specifying the search criteria. This command will look for the first name of Fox via the Name_First key.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find('Name_First = "Fox"')
{
    "_id": "00005d6fc3dc000000000027e065",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": {
        "Lunch": "pulled pork sandwich",
        "Dinner": "steak and baked potato",
        "Breakfast": "eggs and bacon"
    }
}
1 document in set (0.0004 sec)

And, if your search result doesn't find any matching documents, you will get a return of Empty set and the time it took to run the query:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find('Name_First = "Jason"')
Empty set (0.0003 sec)

The first search returned all of the keys in the document because I didn’t specify any search criteria inside the find command – db.foods.find(). This example is how you retrieve just a single key inside a document. And notice the key favorite_food contains a list of sub-keys inside of a key.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find('Name_First = "Fox"').fields('favorite_food')
{
    “favorite_food”: {
            “Lunch”: “pulled pork sandwich”,
            “Dinner”: “steak and baked potato”,
            “Breakfast”: “eggs and bacon”
        }
}
1 document in set (0.0004 sec)

To return multiple keys, add the additional keys inside the fields section, separated by a comma.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find('Name_First = "Fox"').fields("favorite_food", "Name_Last")
{
    “Name_Last”: “Mulder”,
    “favorite_food”: {
            “Lunch”: “pulled pork sandwich”,
            “Dinner”: “steak and baked potato”,
            “Breakfast”: “eggs and bacon”
        }
}
1 document in set (0.0003 sec)


Note: The fields are returned in the order they are stored in the document, and not in the order of the fields section. The query field order does not guarantee the same display order.


 
To return a sub-key from a list, you add the key prior to the sub-key value in the fields section. This search will return the value for the Breakfast sub-key in the favorite_food key.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find( 'Name_First = "Fox"' ).fields( "favorite_food.Breakfast" )
{
    “favorite_food.Breakfast”: “eggs and bacon”
}
1 document in set (0.0002 sec)

Modifying Documents

To modify a document, you use the modify command, along with search criteria to find the document(s) you want to change. Here is the original JSON document.

{
    “_id”: “00005d6fc3dc000000000027e065”,
    “Name_Last”: “Mulder”,
    “Name_First”: “Fox”,
    “favorite_food”: {
        “Lunch”: “pulled pork sandwich”,
        “Dinner”: “steak and baked potato”,
        “Breakfast”: “eggs and bacon”
    }
}

I am going to change Fox's favorite_food sub-keys to Lunch being “soup in a bread bowl” and Dinner to “steak and broccoli”.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.modify("Name_First = 'Fox'").set("favorite_food", {Lunch: "Soup in a bread bowl", Dinner: "steak and broccoli"})
Query OK, 0 items affected (0.0052 sec)

I can see the changes by issuing a find command, searching for Name_First equal to “Fox”.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find('Name_First = "Fox"')
{
    "_id": "00005d6fc3dc000000000027e065",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": {
        “Lunch”: "Soup in a bread bowl",
        "Dinner": "steak and broccoli"
    }
}
1 document in set (0.0004 sec)

Notice that the Breakfast sub-key is no longer in the list. This is because I changed the favorite_food list to only include Lunch and Dinner.

To change only one sub-key under favorite_food, such as Dinner, I can do that with the set command:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.modify("Name_First = 'Fox'").set('favorite_food.Dinner', 'Pizza')
Query OK, 1 item affected (0.0038 sec)

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    "_id": "00005d8122400000000000000002",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": {
        "Lunch": "Soup in a bread bowl",
        "Dinner": "Pizza",
    }
}

If I want to remove a single sub-key under favorite_food, I can use the unset command. I will remove the Dinner sub-key:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.modify("Name_First = 'Fox'").unset("favorite_food.Dinner")
Query OK, 1 item affected (0.0037 sec)

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    "_id": "00005d8122400000000000000002",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": {
        "Lunch": "Soup in a bread bowl",
    }
}

The key favorite_food contained a list, but a key can also contain an array. The main format difference in an array and a list is an array has opening and closing square brackets [ ]. The brackets go on the outside of the values for the array.

Here is the original document with a list for favorite_food:

db.foods.add( {
    Name_First: "Fox",
    Name_Last: "Mulder",
    favorite_food: {
        Breakfast: "eggs and bacon",
        Lunch: "pulled pork sandwich",
        Dinner: "steak and baked potato"
 } 
}
)

I am going to delete the document for Fox, confirm the deletion occurred and insert a new document – but this time, I am going to use an array for favorite_food (and notice the square brackets).

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.remove('Name_First = "Fox"')
Query OK, 1 item affected (0.0093 sec)
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
Empty set (0.0003 sec)

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.add( {
                                         ->     Name_First: "Fox",
                                         ->     Name_Last: "Mulder",
                                         ->     favorite_food: [ { 
                                         ->        Breakfast: "eggs and bacon",
                                         ->        Lunch: "pulled pork sandwich",
                                         ->        Dinner: "steak and baked potato"
                                         ->   } ] 
                                         -> } 
                                         -> )
                                         -> 
Query OK, 1 item affected (0.0078 sec)
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    "_id": "00005da09064000000000027e068",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": [
        {
            "Lunch": "pulled pork sandwich",
            "Dinner": "steak and baked potato",
            "Breakfast": "eggs and bacon"
        }
    ]
}
1 document in set (0.0004 sec)

When dealing with an array, I can modify each element in the array, or I can add another array element. I am going to add Fox's favorite Snack to the array with the arrayAppend command:

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.modify("Name_First = 'Fox'").arrayAppend("favorite_food", {Snack: "Sunflower seeds"})
Query OK, 1 item affected (0.0048 sec)

Rows matched: 1 Changed: 1 Warnings: 0
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    “_id”: “00005da09064000000000027e068”,
    “Name_Last”: “Mulder”,
    “Name_First”: “Fox”,
    “favorite_food”: [
        {
            “Lunch”: “pulled pork sandwich”,
            “Dinner”: “steak and baked potato”,
            “Breakfast”: “eggs and bacon”
        },
        {
            “Snack”: “Sunflower seeds”
        }
    ]
}
1 document in set (0.0004 sec)

The key favorite_food now contains an array with two separate values in it. The sub-keys in the array position favorite_food[0] contains values for Lunch, Breakfast and Dinner values, while the array position favorite_food[1] only contains the Snack value.


Note: If you aren't familiar with arrays, an array is like a list that contains elements. Elements of the array are contained in memory locations relative to the beginning of the array. The first element in the array is actually zero (0) elements away from the beginning of the array. So, the placement of the first element is denoted as array position zero (0) and this is designated by [0]. Most programming languages have been designed this way, so indexing from zero (0) is pretty much inherent to most languages.


 
I can now delete an element in an array with the arrayDelete command. I am going to remove the Snack array member, which is favorite_food[1].

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.modify("Name_First = 'Fox'").arrayDelete("$.favorite_food[1]")
Query OK, 1 item affected (0.0035 sec)

Rows matched: 1 Changed: 1 Warnings: 0
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    "_id": "00005da09064000000000027e068",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "favorite_food": [
        {
            "Lunch": "pulled pork sandwich",
            "Dinner": "steak and baked potato",
            "Breakfast": "eggs and bacon"
        }
    ]
}
1 document in set (0.0004 sec)

Modifying a document – adding a key

If I want to add an extra key, I need to add a few more variables and their values. I will need to define my schema and collection. Then I can use the patch command to add a key to an existing document.

I am going to add a middle name to Fox's document.

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    var schema = session.getSchema('workshop')

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    var collection = schema.getCollection('foods')

 MySQL  127.0.0.1:33060+ ssl  workshop  JS    collection.modify("Name_First = 'Fox'").patch({ Name_Middle: 'William' })
Query OK, 1 item affected (0.0020 sec)

Rows matched: 1 Changed: 1 Warnings: 0
 MySQL  127.0.0.1:33060+ ssl  workshop  JS    db.foods.find("Name_First='Fox'")
{
    "_id": "00005da09064000000000027e068",
    "Name_Last": "Mulder",
    "Name_First": "Fox",
    "Name_Middle": "William",
    "favorite_food": [
        {
            "Lunch": "pulled pork sandwich",
            "Dinner": "steak and baked potato",
            "Breakfast": "eggs and bacon"
        }
    ]
}
1 document in set (0.0004 sec)

Importing Large Data Sets

In order to demonstrate indexes, I will need to import a large amount of data. Before I import data, I need to be sure that I have a variable set for the mysqlx client protocol to allow larger client packets. Specifically, I need to set the mysqlx_max_allowed_packet variable to the largest allowable size of 1TB. You can set this variable in the MySQL configuration file (my.cnf or my.ini) and reboot the MySQL instance, or you can set it for your session.

I can check the values of the mysqlx_max_allowed_packet variable from within Shell, and if it isn't set to 1TB, I will modify it for this session. (I can see the value for mysqlx_max_allowed_packet is set to 100MB or 100 megabytes)

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SHOW VARIABLES like "%packet%"')
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
| Variable_name             | Value      |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
| max_allowed_packet        | 943718400  |
| mysqlx_max_allowed_packet | 104857600  |
| slave_max_allowed_packet  | 1073741824 |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
3 rows in set (0.0021 sec)

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SET @@GLOBAL.mysqlx_max_allowed_packet = 1073741824')
Query OK, 0 rows affected (0.0004 sec)

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SHOW VARIABLES like "%packet%"')
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
| Variable_name             | Value      |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
| max_allowed_packet        | 943718400  |
| mysqlx_max_allowed_packet | 1073741824 |
| slave_max_allowed_packet  | 1073741824 |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+
3 rows in set (0.0021 sec)

I don't want to write this large import to the binary log (I have binlog enabled), so I can use the SQL command to SET SQL_LOG_BIN = 0 first. I am going to create a new collection named project, and then import a 400+ megabyte JSON file into the new collection.

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SET SQL_LOG_BIN=0')
Query OK, 0 rows affected (0.0464 sec)
 MySQL  127.0.0.1:33060+ ssl  JS    session.createSchema("project")

 MySQL  127.0.0.1:33060+ ssl  JS    \use project
Default schema `project` accessible through db.
 MySQL  127.0.0.1:33060+ ssl  JS     util.importJson("./workshop/Doc_Store_Demo_File.json", {schema: "project", collection: "GoverningPersons"})
Importing from file "./workshop/Doc_Store_Demo_File.json" to collection `project`.`GoverningPersons` in MySQL Server at 127.0.0.1:33060

.. 2613346.. 2613346
Processed 415.71 MB in 2613346 documents in 2 min 51.0696 sec (15.28K documents/s)
Total successfully imported documents 2613346 (15.28K documents/s)

As you can see from the output above, I imported 2,613,346 documents.
Note: This imported JSON document contains public data regarding businesses in the State of Washington.

Now that I have my 2.6 million documents in the database, I will do a quick search and limit the results to one record by using the limit command. This will show you the keys in the document.

 MySQL  127.0.0.1:33060+ ssl  JS    \use project
Default schema `project` accessible through db.

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find().limit(1)
{
    "Ubi": "601544680",
    "Zip": "99205",
    "_id": "00005d6fc3dc000000000027e067",
    "City": "SPOKANE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "RT 5",
    "LastName": "FRISCH",
    "FirstName": "BOB",
    "MiddleName": ""
}
1 document in set (0.0022 sec)

Again – notice an index for the key _id has automatically been added:

 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SHOW INDEX FROM project.GoverningPersons')
*************************** 1. row ***************************
Table: GoverningPersons
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: _id
Collation: A
Cardinality: 2481169
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.0137 sec)

To demonstrate indexes on a large dataset, I will perform two searches against all documents in the project collection – where LastName is equal to FRISCH and where LastName is equal to VARNELL. Then, I will create the index on LastName, and re-run the two find queries. Note: I am not displaying all of the returned documents to save space.

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find('LastName = "FRISCH"')
{
    "Ubi": "601544680",
    "Zip": "99205",
    "_id": "00005da09064000000000027e069",
    "City": "SPOKANE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "RT 5",
    "LastName": "FRISCH",
    "FirstName": "BOB",
    "MiddleName": ""
}
. . .
82 documents in set (1.3559 sec)

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find('LastName = "VARNELL"')
{
    "Ubi": "602268651",
    "Zip": "98166",
    "_id": "00005da09064000000000028ffdc",
    "City": "SEATTLE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "18150 MARINE VIEW DR SW",
    "LastName": "VARNELL",
    "FirstName": "JAMES",
    "MiddleName": ""
}
. . .
33 documents in set (1.0854 sec)

The searches took 1.3559 and 1.0854 seconds, respectively. I can now create an index on LastName. When I create an index, I have to specify the data type for that particular key. And for a text key, I have to specify how many characters of that key I want to include in the index. (Note: see TEXT(20) in the command below)

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.createIndex("i_Name_Last", {fields: [{field: "$.LastName", type: "TEXT(20)"}]})
Query OK, 0 rows affected (8.0653 sec)

Now I will re-run the same two queries where LastName equals FRISCH and where LastName equals VARNELL.

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find('LastName = "FRISCH"')
{
    "Ubi": "601544680",
    "Zip": "99205",
    "_id": "00005da09064000000000027e069",
    "City": "SPOKANE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "RT 5",
    "LastName": "FRISCH",
    "FirstName": "BOB",
    "MiddleName": ""
}
. . .
82 documents in set (0.0097 sec)

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find('LastName = "VARNELL"')
{
    "Ubi": "602268651",
    "Zip": "98166",
    "_id": "00005da09064000000000028ffdc",
    "City": "SEATTLE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "18150 MARINE VIEW DR SW",
    "LastName": "VARNELL",
    "FirstName": "JAMES",
    "MiddleName": ""
}
. . .
33 documents in set (0.0008 sec)

The queries ran much faster with an index – 0.0097 seconds and 0.0008 seconds. Not bad for searching 2.6 million records.


Note: The computer I was using for this post is a Hackintosh, running Mac OS 10.13.6, with an Intel i7-8700K 3.7GHz processor with six cores, with 32GB 2666MHz DDR4 RAM, and an SATA III 6 Gb/s, M.2 2280 SSD. Your performance results may vary.


 
And I can take a look at the index for LastName:

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','vertical')
 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SHOW INDEX FROM project.GoverningPersons')
*************************** 1. row ***************************
Table: GoverningPersons
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: _id
Collation: A
Cardinality: 2481169
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 2. row ***************************
Table: GoverningPersons
Non_unique: 1
Key_name: i_Name_Last
Seq_in_index: 1
Column_name: $ix_t20_F1A785D3F25567CD94716D955607AADB04BB3C0E
Collation: A
Cardinality: 300159
Sub_part: 20
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
2 rows in set (0.0059 sec)
 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','table')

I can create an index on multiple columns as well. Here is an index created on the State and Zip fields.

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.createIndex('i_state_zip', {fields: [ {field: '$.State', type: 'TEXT(2)'}, {field: '$.Zip', type: 'TEXT(10)'}]})
Query OK, 0 rows affected (10.4536 sec)

I can take a look at the index as well (the other index results were removed to save space).

 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','vertical')
 MySQL  127.0.0.1:33060+ ssl  JS    session.runSql('SHOW INDEX FROM project.GoverningPersons')
. . .
*************************** 3. row ***************************
Table: GoverningPersons
Non_unique: 1
Key_name: i_state_zip
Seq_in_index: 1
Column_name: $ix_t2_00FFBF570DC47A52910DDA38C0C1FB1361F0426A
Collation: A
Cardinality: 864
Sub_part: 2
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 4. row ***************************
Table: GoverningPersons
Non_unique: 1
Key_name: i_state_zip
Seq_in_index: 2
Column_name: $ix_t10_18619E3AC96C74FECCF6B622D9DB0864C2938AB6
Collation: A
Cardinality: 215626
Sub_part: 10
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
4 rows in set (0.0066 sec)
 MySQL  127.0.0.1:33060+ ssl  JS    shell.options.set('resultFormat','table')

I will run a query looking for the first entry FRISH based upon his state (WA) and Zip (99205). The query result time is still pretty good, even though I am also returning his LastName.

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.find("State='WA' AND Zip = '99205' AND LastName = 'FRISCH'")
{
    "Ubi": "601544680",
    "Zip": "99205",
    "_id": "00005da09064000000000027e069",
    "City": "SPOKANE",
    "State": "WA",
    "Title": "GOVERNOR",
    "Address": "RT 5",
    "LastName": "FRISCH",
    "FirstName": "BOB",
    "MiddleName": ""
}
1 document in set (0.0011 sec)

To drop an index, use the dropIndex command:

 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.dropIndex("i_Name_Last")

NoSQL and SQL in the same database

The great advantage to using MySQL to store JSON documents is that the data is stored in the InnoDB storage engine. So, the NoSQL document store database has features and benefits of InnoDB – such as transactions and ACID-compliance. This also means that you can use MySQL features like replication, group replication, transparent data encryption, etc. And if you need to restore a backup and play back the binlog files for a point-in-time recovery, you can do that as well, as all of the NoSQL transactions are written to the MySQL Binary Log. Here is what a NoSQL transaction looks like in the MySQL binary log:

NoSQL:

collection.modify(""Name_First = 'Selena'").patch({ Name_Middle: 'Kitty' })

MySQL Binlog:

# at 3102
#191018 11:17:41 server id 3451 end_log_pos 3384 CRC32 0xd0c12cca Query thread_id=15 exec_time=0 error_code=0
SET TIMESTAMP=1571411861/*!*/;
UPDATE `workshop`.`foods` SET doc=JSON_SET(JSON_MERGE_PATCH(doc, JSON_OBJECT('Name_Middle','Kitty')),'$._id',JSON_EXTRACT(`doc`,'$._id')) WHERE (JSON_EXTRACT(doc,'$.Name_First') = 'Selena')
/*!*/;
# at 3384
#191018 11:17:41 server id 3451 end_log_pos 3415 CRC32 0xe0eaf4ef Xid = 246
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;

Finally, here is an example of how to do a transaction:

 MySQL  127.0.0.1:33060+ ssl  JS    session.startTransaction()
Query OK, 0 rows affected (0.0013 sec)
 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.modify("Ubi = '601544680'").set("MiddleName", "PETER")
Query OK, 1 item affected (0.0089 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 MySQL  127.0.0.1:33060+ ssl  JS    session.rollback()
Query OK, 0 rows affected (0.0007 sec)
 MySQL  127.0.0.1:33060+ ssl  JS    db.GoverningPersons.modify("Ubi = '601544680'").set("MiddleName", "STEVEN")
Query OK, 1 item affected (0.0021 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 MySQL  127.0.0.1:33060+ ssl  JS    session.commit()
Query OK, 0 rows affected (0.0002 sec)

JSON functions

Since the MySQL Document Store utilizes the MySQL database server and since the documents are stored in InnoDB, you can also use MySQL SQL JSON functions as well to manipulate the data stored in either a JSON document or in a JSON data type. Here is a list of the JSON functions available, and while JSON functions were introduced in 5.7, not all of these functions will be in 5.7 – but they are all in version 8.0.

JSON_ARRAY JSON_ARRAY_APPEND JSON_ARRAY_INSERT JSON_CONTAINS
JSON_CONTAINS_PATH JSON_DEPTH JSON_EXTRACT JSON_INSERT
JSON_KEYS JSON_LENGTH JSON_MERGE_PATCH JSON_MERGE_PRESERVE
JSON_OBJECT JSON_OVERLAPS JSON_PRETTY JSON_QUOTE
JSON_REMOVE JSON_REPLACE JSON_SCHEMA_VALID JSON_SCHEMA_VALIDATION_REPORT
JSON_SEARCH JSON_SET JSON_STORAGE_FREE JSON_STORAGE_SIZE
JSON_TABLE JSON_TYPE JSON_UNQUOTE JSON_VALID
MEMBER OF

But you already use Mongo? And you have MySQL as well?

If you already use Mongo and MySQL, and you want to switch to MySQL, or if your DBA's already know Mongo, then moving to MySQL or using the MySQL doc store is pretty easy. The commands used in Mongo are very similar to the ones used in the MySQL Document Store. The login command is similar to the MySQL “regular” client command in that you specify the user and password with the -u and -p.

In MySQL Shell, you put the username followed by the @ and the IP address. And you can specify which database or schema you want to use upon login. Mongo does have a few shortcuts with the show command, as in show dbs, show schemas or show collections. But you can do almost the same in MySQL Shell by setting variables to equal certain functions or commands (like I did earlier.)

To create a schemda/database in Mongo, you simply execute the use database name command – and it creates the schema/database if it doesn't exist. The other commands having to do with collections are almost the same.

Where there are differences, they are relatively small. Mongo uses the command insert and MySQL uses add when inserting documents. But then, other commands such as the remove document command are the same.

As for the last item in the table below, native Mongo can't run any SQL commands (without using a third-party software GUI tool) – so again, if you have MySQL DBA's on staff, the learning curve can be lower because they can use SQL commands if they don't remember the NoSQL commands.

Command Mongo MySQL Document Store (via Shell)
Login mongo ‐u ‐p ‐‐database mysqlsh root@127.0.0.1 ‐‐database
Select schema use database_name \use database_name
Show schemas/dbs show dbs session.getSchemas()
Create schema use database_name session.createSchema()
Show collections show collections or db.getCollectionNames(); db.getCollections()
Create collection db.createCollection("collectionName"); db.createCollection("collectionName");
Insert document db.insert({field1: "value", field2: "value"}) db.insert({field1: "value", field2: "value"})
Remove document db.remove() db.remove()
Run SQL command session.runSql(SQL_command)

So, with MySQL, you have a hybrid solution where you can use both SQL and NoSQL at the same time. And, since you are storing the JSON documents inside MySQL, you an also use the MySQL JSON functions to search, update and delete the same records.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots 
Visit http://2044thebook.com for more information.
Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition 
Visit https://amzn.to/2oPFLI0 for more information.

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

$
0
0

AWS is the #1 cloud provider for open-source database hosting, and the go-to cloud for MySQL deployments. As organizations continue to migrate to the cloud, it’s important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. While many AWS users default to their managed database solution, Amazon RDS, there are alternatives available that can improve your MySQL performance on AWS through advanced customization options and unlimited EC2 instance type support. ScaleGrid offers a compelling alternative to hosting MySQL on AWS that offers better performance, more control, and no cloud vendor lock-in and the same price as Amazon RDS. In this post, we compare the performance of MySQL Amazon RDS vs. MySQL Hosting at ScaleGrid on AWS High Performance instances.

TLDR

ScaleGrid’s MySQL on AWS High Performance deployment can provide 2x-3x the throughput at half the latency of Amazon RDS for MySQL with their added advantage of having 2 read replicas as compared to 1 in RDS.

MySQL on AWS Performance Test

ScaleGrid Amazon RDS
Instance Type AWS High Performance XLarge (see system details below) DB Instance r4.xlarge (Multi-AZ)
Deployment Type 3 Node Master-Slave Set with Semisynchronous Replication Multi-AZ Deployment with 1 Read Replica
SSD Disk Local SSD & General Purpose – 2TB General Purpose – 2TB
Monthly Cost (USD) $1,798 $1,789

 

As you can see from the above table, MySQL RDS pricing is within $10 of ScaleGrid’s fully managed MySQL hosting solution.

What are ScaleGrid’s High Performance Replica Sets?

The ScaleGrid MySQL on AWS High Performance replica set uses a hybrid of local SSD and EBS disk to achieve both high performance and high reliability. A typical configuration is deployed using a 3-node replica set:

  • The Master and Slave-1 use local SSD disks.
  • Slave-2 uses an EBS disk (can be General Purpose or a Provisioned IOPS disk).

High Performance MySQL AWS 3-Node Replica Set - ScaleGrid DBaaS

What does this mean? Since the Master and the Slave-1 are running on local SSD, you get the best possible disk performance from your AWS machines. No more network-based EBS, just blazing-fast local SSD. Reads and writes to your Primary, and even reads from Slave-1 will work at SSD speed. Slave-2 uses an EBS data disk, and you can configure the amount of IOPS required for your cluster. This configuration provides complete safety for your data, even in the event you lose the local SSD disks.

ScaleGrid’s MySQL AWS High Performance XLarge replica set uses i3.xlarge (30.5 GB RAM) instances with local SSD for the Master and Slave-1, and an i3.2xlarge (61 GB RAM) instance for Slave-2.

MySQL Configuration

A similar MySQL configuration is used on both ScaleGrid and RDS deployments:

Configuration Value
version 5.7.25 community edition
innodb_buffer_pool_size 25G
innodb_log_file_size 1G
innodb_flush_log_at_trx_commit 1
sync_binlog 1
innodb_io_capacity 3000
innodb_io_capacity_max 6000
slave_parallel_workers 30
slave_parallel_type LOGICAL_CLOCK

MySQL Performance Benchmark Configuration

Configuration Details
Tool Sysbench version 1.0.17
Host 1 r4.xlarge located in the same AWS datacenter as the Master MySQL
# Tables 100
# Rows per table 5,000,000
Workload generating script oltp_read_write.lua

 

MySQL Performance Test Scenarios and Results

To ensure we provide informative results for all MySQL AWS workload types, we have broken down our tests into these three scenarios so you can evaluate based on your read/write workload intensity:

  1. Read-Intensive Workload: 80% Reads and 20% Writes
  2. Balanced Workload: 50% Reads and 50% Writes
  3. Write-Intensive Workload: 20% Reads and 80% Writes

Each scenario is run with varying number of sysbench client threads ranging from 50 to 400, and each test is run for a duration of 10 minutes. We measure throughput in terms of Queries Per Second (QPS) and 95th Percentile latency, and ensure that the max replication lag on the slaves does not cross 30s. For some of the tests on the ScaleGrid deployment, MySQL configuration binlog_group_commit_sync_delay is tuned so that the slave replication lag does not go beyond 30s. This technique is referred to as ‘slowing down the master to speed up the slaves’ and is explained in J-F Gagne’s blog.

How to Improve #MySQL AWS Performance 2X Over Amazon RDS at The Same CostClick To Tweet

Scenario-1: Read-Intensive Workload with 80% Reads and 20% Writes

ScaleGrid vs Amazon RDS: MySQL Throughput Performance Test - 80 Percent Read 20 Percent Write

As we can see from the read-intensive workload tests, ScaleGrid high performance MySQL instances on AWS are able to consistently handle around 27,800 QPS anywhere from 50 up to 400 threads. This is almost a 200% increase over MySQL RDS performance which averages only 9,411 QPS across the same range of threads.

ScaleGrid vs Amazon RDS: MySQL Latency Performance Test - 80 Percent Read 20 Percent Write

ScaleGrid also maintains 53% lower latency on average throughout the entire MySQL AWS performance tests. Both Amazon RDS and ScaleGrid latency increase steadily as the number of threads grows, where ScaleGrid maxes out at 383ms for 400 threads while Amazon RDS is at 831ms at the same level.

Scenario-2: Balanced Workload with 50% Reads and 50% Writes

ScaleGrid vs Amazon RDS: MySQL Throughput Performance Test - 50 Percent Read 50 Percent Write

In our balanced workload performance tests, ScaleGrid’s MySQL High Performance deployment on AWS outperforms again with an average of 20,605 QPS on threads ranging from 50 to 400. Amazon RDS only averaged 8,296 for the same thread count, resulting in a 148% improvement with ScaleGrid.

ScaleGrid vs Amazon RDS: MySQL Latency Performance Test - 50 Percent Read 50 Percent Write

Both ScaleGrid and Amazon RDS latency significantly decreased in the balanced workload tests compared to the read-intensive tests covered above. Amazon RDS averaged 258ms latency in the balanced workload tests, where ScaleGrid only averaged 125ms achieving over a 52% reduction in latency over MySQL on Amazon RDS.

Scenario-3: Write-Intensive Workload with 20% Reads and 80% Writes

ScaleGrid vs Amazon RDS: MySQL Throughput Performance Test - 20 Percent Read 80 Percent Write

In our final write-intensive MySQL AWS workload scenario, ScaleGrid achieved significantly higher throughput performance with an average of 17,007 QPS over the range of 50 to 400 threads. This is a 123% improvement over Amazon RDS who only achieved 7,638 QPS over the same number of threads.

ScaleGrid vs Amazon RDS: MySQL Latency Performance Test - 20 Percent Read 80 Percent Write

The 95th percentile latency tests also produced significantly lower latency for ScaleGrid at an average of 114ms over 50 to 400 threads. Amazon RDS achieved an average of 247ms in their latency tests, resulting in a 54% average reduction in latency when deploying ScaleGrid’s High Performance MySQL on AWS services over Amazon RDS.

Analysis

As we observed from the test results, read-intensive workloads resulted in both higher throughput and latency over balanced workloads and write-intensive workloads, regardless of how MySQL was deployed on AWS:

MySQL on AWS Throughput Performance Test Averages ScaleGrid Amazon RDS ScaleGrid Improvement
Read-Intensive Throughput 27,795 9,411 195.4%
Balance Workload Throughput 20,605 8,296 148.4%
Write-Intensive Throughput 17,007 7,638 122.7%

 

MySQL on AWS Latency Performance Test Averages ScaleGrid Amazon RDS ScaleGrid Improvement
Read-Intensive Latency 206ms 439ms -53.0%
Balanced Workload Latency 125ms 258ms -51.6%
Write-Intensive Latency 114ms 247ms -53.8%

Explanation of Results

  • We see that the ScaleGrid MySQL on AWS deployment provided close to 3x better throughput for the read-intensive workload compared to the RDS deployment.
  • As the write load increased, though the absolute throughput decreased, ScaleGrid still provided close to 2.5x better throughput performance.
  • For write-intensive workloads, we found that the replication lag started kicking in for the EBS slave on the ScaleGrid deployment. Since our objective was to keep the slave replication lag within 30s for our runs, we introduced binlog_group_commit_sync_delay to ensure that slave could achieve better parallel execution. This controlled the delay, and resulted in lesser absolute throughput on the ScaleGrid deployment, but we could still see a 2.2x better throughput compared to RDS deployment.
  • For all of the read-intensive, write-intensive, and balanced workload scenarios, ScaleGrid offered 0.5X lower latency characteristics compared to RDS.

ScaleGrid ‘High Performance’ deployment can provide 2x-3x the throughput at half the latency of RDS with an added advantage of having 2 read replicas as compared to 1 in RDS. To learn more about ScaleGrid’s MySQL hosting advantages over Amazon RDS for MySQL, check out our Compare MySQL Providers page or start a free 30-day trial to explore the fully managed DBaaS platform.

Learn More About MySQL Hosting

A Guide to MySQL Galera Cluster Streaming Replication: Part One

$
0
0

Streaming Replication is a new feature which was introduced with the 4.0 release of Galera Cluster. Galera uses replication synchronously across the entire cluster, but before this release write-sets greater than 2GB were not supported. Streaming Replication allows you to now replicate large write-sets, which is perfect for bulk inserts or loading data to your database.

In a previous blog we wrote about Handling Large Transactions with Streaming Replication and MariaDB 10.4, but as of writing this blog Codership had not yet released their version of the new Galera Cluster. Percona has, however, released their experimental binary version of Percona XtraDB Cluster 8.0 which highlights the following features...

  • Streaming Replication supporting large transactions

  • The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)

  • More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.

  • Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.

  • Multiple system tables help find out more about the state of the cluster state.

  • The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

What's New With Galera Cluster 4.0?

The New Streaming Replication Feature

With Streaming Replication, transactions are replicated gradually in small fragments during transaction processing (i.e. before actual commit, we replicate a number of small size fragments). Replicated fragments are then applied in slave threads, preserving the transaction’s state in all cluster nodes. Fragments hold locks in all nodes and cannot be conflicted later.

Galera SystemTables 

Database Administrators and clients with access to the MySQL database may read these tables, but they cannot modify them as the database itself will make any modifications needed. If your server doesn’t have these tables, it may be that your server is using an older version of Galera Cluster.

#> show tables from mysql like 'wsrep%';

+--------------------------+

| Tables_in_mysql (wsrep%) |

+--------------------------+

| wsrep_cluster            |

| wsrep_cluster_members    |

| wsrep_streaming_log      |

+--------------------------+

3 rows in set (0.12 sec)

New Synchronization Functions 

This version introduces a series of SQL functions for use in wsrep synchronization operations. You can use them to obtain the Global Transaction ID which is based on either the last write or last seen transaction. You can also set the node to wait for a specific GTID to replicate and apply, before initiating the next transaction.

Intelligent Donor Selection

Some understated features that have been present since Galera 3.x include intelligent donor selection and cluster crash recovery. These were originally planned for Galera 4, but made it into earlier releases largely due to customer requirements. When it comes to donor node selection in Galera 3, the State Snapshot Transfer (SST) donor was selected at random. However with Galera 4, you get a much more intelligent choice when it comes to choosing a donor, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. As a Database Administrator, you can force this via setting wsrep_sst_donor.

Why Use MySQL Galera Cluster Streaming Replication?

Long-Running Transactions

Galera's problems and limitations always revolved around how it handled long-running transactions and oftentimes caused the entire cluster to slow down due to large write-sets being replicated. It's flow control often goes high, causing the writes to slow down or even terminating the process in order to revert the cluster back to its normal state. This is a pretty common issue with previous versions of Galera Cluster.

Codership advises to use Streaming Replication for your long-running transactions to mitigate these situations. Once the node replicates and certifies a fragment, it is no longer possible for other transactions to abort it.

Large Transactions

This is very helpful when loading data to your report or analytics. Creating bulk inserts, deletes, updates, or using LOAD DATA statement to load large quantity of data can fall down in this category. Although it depends on how your manage your data for retrieval or storage. You must take into account that Streaming Replication has its limitations such that certification keys are generated from record locks. 

Without Streaming Replication, updating a large number of records would result in a conflict and the whole transaction would have to be rolled back. Slaves that are also replicating large transactions are subject to the flow control as it hits the threshold and starts slowing down the entire cluster to process any writes as they tend to relax receiving incoming transactions from the synchronous replication. Galera will relax the replication until the write-set is manageable as it allows to continue replication again. Check this external blog by Percona to help you understand more about flow control within Galera.

With Streaming Replication, the node begins to replicate the data with each transaction fragment, rather than waiting for the commit. This means that there's no way for any conflicting transactions running within the other nodes to abort since this simply affirms that the cluster has certified the write-set for this particular fragment. It’s free to apply and commit other concurrent transactions without blocking and process large transaction with a minimal impact on the cluster.

Hot Records/Hot Spots

Hot records or rows are those rows in your table that gets constantly get updated. These data could be the most visited and highly gets the traffic of your entire database (e.g. news feeds, a counter such as number of visits or logs). With Streaming Replication, you can force critical updates to the entire cluster. 

As noted by the Galera Team at Codership

“Running a transaction in this way effectively locks the hot record on all nodes, preventing other transactions from modifying the row. It also increases the chances that the transaction will commit successfully and that the client in turn will receive the desired outcome.”

This comes with limitations as it might not be persistent and consistent that you'll have successful commits. Without using Streaming Replication, you'll end up high chances or rollbacks and that could add overhead to the end user when experiencing this issue in the application's perspective.

Things to Consider When Using Streaming Replication

  • Certification keys are generated from record locks, therefore they don’t cover gap locks or next key locks. If the transaction takes a gap lock, it is possible that a transaction, which is executed on another node, will apply a write set which encounters the gap log and will abort the streaming transaction.
  • When enabling Streaming Replication, write-set logs are written to wsrep_streaming_log table found in the mysql system database to preserve persistence in case crash occurs, so this table serves upon recovery. In case of excessive logging and elevated replication overhead, streaming replication will cause degraded transaction throughput rate. This could be a performance bottleneck when high peak load is reached. As such, it’s recommended that you only enable Streaming Replication at a session-level and then only for transactions that would not run correctly without it.
  • Best use case is to use streaming replication for cutting large transactions
  • Set fragment size to ~10K rows
  • Fragment variables are session variables and can be dynamically set
  • Intelligent application can set streaming replication on/off on need basis

Conclusion

Thanks for reading, in part two we will discuss how to enable Galera Cluster Streaming Replication and what the results could look like for your setup.

 

The Slow Software Movement

$
0
0

Those who build SaaS know that you’re actually providing an experience. You’re building long-term relationships and aiming for recurring revenue. Therefore, your product must be flexible, resilient, and consistent.

Many companies find it hard to take time, acquire the right ingredients, and design truly great software from the bottom-up. This is understandable given how fast the SaaS industry moves.

But firstly, who am I to be talking about this subject? My education was in Economics and Complex Systems, and I’ve been fortunate to gain exposure to various IT trends and tools working almost a decade mostly in Tech. I started interviewing customers at Continuent (majority SaaS) over five years ago, and I have listened to a number of DevOps, IT, Systems Engineers and DBAs talk about their goals and limitations.

With so many self-serve, grab-and-go tools such as AWS Aurora, CloudSQL and instant-downloads to DIY, it can be tempting to point-and-click because these methods sound like they should get you up quickly, easily, and cheaply. Especially without knowledgeable and trusted collaborators, it is challenging and not always part of your mission to build a scalable and reliable system or achieve continuous operations.

In the age of fast food, fast fashion, fast relationships, do we really want to go down the path of fast software? Certainly not past a certain scale. Jerome H. Saltzer and M. Frans Kaashoek illustrate in their book, Principles of Computer Systems Design:

The captain of a modern oil supertanker finds that the ship is so massive that when underway at full speed it takes 12 miles to bring it to a straight line stop—but 12 miles is beyond the horizon as viewed from the ship’s bridge.

Partly because not all elements of a system scale the same way, it can be pretty important for a growing business to design architecture with veterans; those who have been building and maintaining large-scale geographically distributed systems for decades have an irreplaceable vantage point.

There’s a lot to be said for UX and UI to keep a relationship going, but what’s under the hood is non-negotiable. There are enormous costs to correct problems in foundational architecture. Some SaaS companies are taking pleasure in the process of building and providing the highest quality digital experiences – and their approach is slow, not fast.

There is no quick and easy solution for anything great. This is the reason SaaS brands run their operations on hardened, open source databases like MySQL and MariaDB. Backing up mysqld with certain guarantees of reliability and resilience such as zero downtime maintenance empowers organizations to achieve more with less.

Architecture in Greece, still standing after thousands of years. Photographer: Sofiwa

If you are creating a digital experience people love, reach out, the experts who specialize in continuous database operations would love to speak with you. My colleagues love helping customers build software that is flexible, resilient, and consistent from the bottom-up.

Since 2004, Continuent has been providing solutions that help organizations leverage MySQL for geographically distributed business-critical applications. Some 70% of Continuent’s customers are SaaS providers, including companies such as Adobe, F-Secure, Marketo, NewVoiceMedia running billions of dollars businesses on proven solutions for continuous MySQL availability.

Tungsten creates clusters at your database layer so you may reap long-term benefits of reliability, performance, and efficiency from MySQL, MariaDB, or Percona Server. It requires no application changes, works with on-prem, hybrid, cloud, and multi-cloud environments and includes 24/7 enterprise-level support.

This is not the point-and-click approach – you must discuss and design with our highly experienced engineers. However, from what I’ve seen, you will be glad you did so.


MySQL Connector/C++ 1.1.13 has been released

$
0
0

Dear MySQL Users,

MySQL Connector/C++ 1.1.13 is a new release version of the MySQL
Connector/C++ 1.1 series.

Connector/C++ 1.1 can be used to access MySQL using an API
based on JDBC4.

For information on using Connector/C++ 1.1, see

https://dev.mysql.com/doc/connector-cpp/1.1/en/

To download MySQL Connector/C++ 1.1.13, see the “General
Availability (GA) Releases” tab at

http://dev.mysql.com/downloads/connector/cpp/1.1.html

MySQL Connector C++ (Commercial) will be available for download on the
My Oracle Support (MOS) website. This release will be available on
eDelivery (OSDC) in next month’s upload cycle.

Connector/C++ 1.1 offers an easy to use API derived from JDBC 4.0.

Enjoy!

Changes in MySQL Connector/C++ 1.1.13 (2019-10-25, General
Availability)

Character Set Support

     * Connector/C++ now supports the utf8mb4_0900_bin collation
       added for the utf8mb4 Unicode character set in MySQL
       8.0.17. For more information about this collation, see
       Unicode Character Sets
(https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html).

Bugs Fixed

     * Connector/C++ set the transaction isolation level to
       REPEATABLE READ at connect time, regardless of the
       current server setting. (Bug #22038313, Bug #78852, Bug
       #30294415)

On Behalf of MySQL/ORACLE RE Team
Gipson Pulla

Best Practices to Secure Your MySQL Databases

$
0
0

Author: Robert Agar

MySQL is one of the most popular database platforms in the world. It is widely used to power eCommerce sites and web applications that are essential components of many companies’ business strategies. MySQL databases are often the repository for sensitive customer data gathered while conducting business as well as information regarding internal processes and personnel.

An organization’s databases are responsible for storing and manipulating the information required to keep it operating and competing effectively in their market. They are critically important to a company’s success and need to be guarded and kept secure. The database team comprises an enterprise’s first line of defense and is responsible for implementing security policies and standards that minimize the chances for the systems to be accessed by unauthorized users or exposed to malicious malware.

One of the challenges facing DBAs is the proliferation of multiple-platform environments that they are expected to successfully manage. While some security measures are transferable between competing databases, there are variations that can make securing their systems challenging when bouncing from platform to platform. Let’s take a look at some of the steps a MySQL DBA should be taking to ensure the security of their databases.

Hardening the Database

Hardening a computer system is a means of protecting it from risks and threats with a multi-layered approach. Each layer, such as the host, application, user, and physical requires different methods to ensure security. Here are some specific steps to take that will harden your MySQL server. When modifying the my.cnf file, you will need to restart MySQL for the changes to be implemented.

Encrypting connections – MySQL network connections are not encrypted by default. This needlessly exposes your data and should be addressed by enforcing network connection encryption.

Setting a connection error limit – Multiple unsuccessful authentications may indicate an attack by unauthorized users. Allowing for a reasonable number of incorrect attempts can be done with the max_connect_errors parameter in the my.cnf file.

Disable the SHOW DATABASES command – This command can be used by attackers to identify the available databases in preparation for an attack. You can disable the command by inserting the skip-show-database line to the mysqld section of the configuration file.

Run the MySQL hardening script – Running the hardening script included with MySQL will lead you through a series of questions that help you set the level of hardening for your system. Execute the script with the mysql_secure_installation command.

Additional Security Measures

While the steps outlined above to harden your MySQL systems are an important start, there are many other ways to strengthen the security of your databases. Here are some of them.

Change the default port and account – By default, MySQL runs on port 3306 using the superuser “root” account. The default port should be changed in the configuration file to make your database less susceptible to random cyberattacks. The recommended method of dealing with the root account is to create a new superuser account and to eliminate all root@ accounts.

Tightly control database access – Fewer is better when it comes to users permitted to access a database. Permissions should be managed using groups or roles and the minimum privileges required to do a job should be granted to any individual.

Auditing and monitoring database activity – This is an essential task that can alert you to a number of security violations or flaws. Monitoring can help identify compromised accounts or those conducting suspicious activities. Periodically auditing your systems will show if accounts have been created outside of the normal workflow, perhaps by a hacker.

Maintaining Secure MySQL Instances

SQL Diagnostic Manager for MySQL offers a platform from which you can monitor and audit your database instances to ensure they remain secure. It provides over 600 pre-built monitors with the ability to give you real-time insight into your system’s performance. Monitored data can be displayed in customizable dashboards making it easy to identify and address problems promptly.

Managing user access and finding problematic trends can be accomplished by focusing the information available through the tool’s audit log. It lets you quickly see failed logins and events as well as changes made to the database. The data gained from studying these logs can help you discover potential security flaws and better protect your systems and the critical data they contain. It’s a powerful tool that should be part of all MySQL DBAs software repertoire.

The post Best Practices to Secure Your MySQL Databases appeared first on Monyog Blog.

A Guide to MySQL Galera Cluster Streaming Replication: Part Two

$
0
0

In the first part of this blog we provided an overview of the new Streaming Replication feature in MySQL Galera Cluster. In this blog we will show you how to enable it and take a look at the results.

Enabling Streaming Replication

It is highly recommended that you enable Streaming Replication at a session-level for the specific transactions that interact with your application/client. 

As stated in the previous blog, Galera logs its write-sets to the wsrep_streaming_log table in MySQL database. This has the potential to create a performance bottleneck, especially when a rollback is needed. This doesn't mean that you can’t use Streaming Replication, it just means you need to design your application client efficiently when using Streaming Replication so you’ll get better performance. Still, it's best to have Streaming Replication for dealing with and cutting down large transactions.

Enabling Streaming Replication requires you to define the replication unit and number of units to use in forming the transaction fragments. Two parameters control these variables: wsrep_trx_fragment_unit and wsrep_trx_fragment_size.

Below is an example of how to set these two parameters:

SET SESSION wsrep_trx_fragment_unit='statements';

SET SESSION wsrep_trx_fragment_size=3;

In this example, the fragment is set to three statements. For every three statements from a transaction, the node will generate, replicate, and certify a fragment.

You can choose between a few replication units when forming fragments:

  • bytes - This defines the fragment size in bytes.
  • rows - This defines the fragment size as the number of rows the fragment updates.
  • statements - This defines the fragment size as the number of statements in a fragment.

Choose the replication unit and fragment size that best suits the specific operation you want to run.

Streaming Replication In Action

As discussed in our other blog on handling large transactions in Mariadb 10.4, we performed and tested how Streaming Replication performed when enabled based on this criteria...

  1. Baseline, set global wsrep_trx_fragment_size=0;
  2. set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
  3. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
  4. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

And results are

Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)

Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)

Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)

Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

For this example we're using Percona XtraDB Cluster 8.0.15 straight from their testing branch using the Percona-XtraDB-Cluster_8.0.15.5-27dev.4.2_Linux.x86_64.ssl102.tar.gz build. 

We then tried a 3-node Galera cluster with hosts info below:

testnode11 = 192.168.10.110

testnode12 = 192.168.10.120

testnode13 = 192.168.10.130

We pre-populated a table from my sysbench database and tried to delete a very large rows. 

root@testnode11[sbtest]#> select count(*) from sbtest1;

+----------+

| count(*) |

+----------+

| 12608218 |

+----------+

1 row in set (25.55 sec)

At first, running without Streaming Replication,

root@testnode12[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size,  @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| bytes                     | 0 |                         50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)

Then run,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

However, we ended up getting a rollback...

---TRANSACTION 648910, ACTIVE 573 sec rollback

mysql tables in use 1, locked 1

ROLLING BACK 164858 lock struct(s), heap size 18637008, 12199395 row lock(s), undo log entries 11961589

MySQL thread id 183, OS thread handle 140041167468288, query id 79286 localhost 127.0.0.1 root wsrep: replicating and certifying write set(-1)

delete from sbtest1 where id >= 2000000

Using ClusterControl Dashboards to gather an overview of any indication of flow control, since the transaction runs solely on the master (active-writer) node until commit time, there's no any indication of activity for flow control:

ClusterControl Galera Cluster Overview

In case you’re wondering, the current version of ClusterControl does not yet have direct support for PXC 8.0 with Galera Cluster 4 (as it is still experimental). You can, however, try to import it... but it needs minor tweaks to make your Dashboards work correctly. 

Back to the query process. It failed as it rolled back!

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

ERROR 1180 (HY000): Got error 5 - 'Transaction size exceed set threshold' during COMMIT

regardless of the wsrep_max_ws_rows or wsrep_max_ws_size,

root@testnode11[sbtest]#> select @@global.wsrep_max_ws_rows, @@global.wsrep_max_ws_size/(1024*1024*1024);

+----------------------------+---------------------------------------------+

| @@global.wsrep_max_ws_rows | @@global.wsrep_max_ws_size/(1024*1024*1024) |

+----------------------------+---------------------------------------------+

|                          0 |               2.0000 |

+----------------------------+---------------------------------------------+

1 row in set (0.00 sec)

It did, eventually, reach the threshold.

During this time the system table mysql.wsrep_streaming_log is empty, which indicates that Streaming Replication is not happening or enabled,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.01 sec)



root@testnode13[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.00 sec)

and that is verified on the other 2 nodes (testnode12 and testnode13).

Now, let's try enabling it with Streaming Replication,

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| bytes                     | 0 |                      50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)



root@testnode11[sbtest]#> set wsrep_trx_fragment_unit='rows'; set wsrep_trx_fragment_size=100; 

Query OK, 0 rows affected (0.00 sec)



Query OK, 0 rows affected (0.00 sec)



root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;

+---------------------------+---------------------------+----------------------------+

| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |

+---------------------------+---------------------------+----------------------------+

| rows                      | 100 |                      50000 |

+---------------------------+---------------------------+----------------------------+

1 row in set (0.00 sec)

What to Expect When Galera Cluster Streaming Replication is Enabled? 

When query has been performed in testnode11,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

What happens is that it fragments the transaction piece by piece depending on the set value of variable wsrep_trx_fragment_size. Let's check this in the other nodes:

Host testnode12

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''

TRANSACTIONS

------------

Trx id counter 567148

Purge done for trx's n:o < 566636 undo n:o < 0 state: running but idle

History list length 44

LIST OF TRANSACTIONS FOR EACH SESSION:

..

...

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE 190 sec

18393 lock struct(s), heap size 2089168, 1342600 row lock(s), undo log entries 1342600

MySQL thread id 898, OS thread handle 140266050008832, query id 216824 wsrep: applied write set (-1)

--------

FILE I/O

1 row in set (0.08 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 211197844753 |

| wsrep_flow_control_paused        | 0.133786 |

| wsrep_flow_control_sent          | 633 |

| wsrep_flow_control_recv          | 878 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.00 sec)



+----------+

| count(*) |

+----------+

|    13429 |

+----------+

1 row in set (0.04 sec)

 

Host testnode13

root@testnode13[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''

TRANSACTIONS

------------

Trx id counter 568523

Purge done for trx's n:o < 567824 undo n:o < 0 state: running but idle

History list length 23

LIST OF TRANSACTIONS FOR EACH SESSION:

..

...

---TRANSACTION 552701, ACTIVE 216 sec

21587 lock struct(s), heap size 2449616, 1575700 row lock(s), undo log entries 1575700

MySQL thread id 936, OS thread handle 140188019226368, query id 600980 wsrep: applied write set (-1)

--------

FILE I/O

1 row in set (0.28 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 210755642443 |

| wsrep_flow_control_paused        | 0.0231273 |

| wsrep_flow_control_sent          | 1653 |

| wsrep_flow_control_recv          | 3857 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.01 sec)



+----------+

| count(*) |

+----------+

|    15758 |

+----------+

1 row in set (0.03 sec)

Noticeably, the flow control just kicked in!

ClusterControl Galera Cluster Overview

And WSREP queues send/received has been kicking as well:

 
ClusterControl Galera Overview
Host testnode12 (192.168.10.120)
ClusterControl Galera Overview
 Host testnode13 (192.168.10.130)

Now, let's elaborate more of the result from the mysql.wsrep_streaming_log table,

root@testnode11[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

MySQL thread id 134822, OS thread handle 140041167468288, query id 0 System lock

---TRANSACTION 649008, ACTIVE 481 sec

mysql tables in use 1, locked 1

53104 lock struct(s), heap size 6004944, 3929602 row lock(s), undo log entries 3876500

MySQL thread id 183, OS thread handle 140041167468288, query id 105367 localhost 127.0.0.1 root updating

delete from sbtest1 where id >= 2000000

--------

FILE I/O

1 row in set (0.01 sec)

then taking the result of,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;

+----------+

| count(*) |

+----------+

|    38899 |

+----------+

1 row in set (0.40 sec)

It tells how much fragment has been replicated using Streaming Replication. Now, let's do some basic math:

root@testnode12[sbtest]#> select 3876500/38899.0;

+-----------------+

| 3876500/38899.0 |

+-----------------+

|         99.6555 |

+-----------------+

1 row in set (0.03 sec)

I'm taking the undo log entries from the SHOW ENGINE INNODB STATUS\G result and then divide the total count of the mysql.wsrep_streaming_log records. As I've set it earlier, I defined wsrep_trx_fragment_size= 100. The result will show you how much the total replicated logs are currently being processed by Galera.

It’s important to take note at what Streaming Replication is trying to achieve... "the node breaks the transaction into fragments, then certifies and replicates them on the slaves while the transaction is still in progress. Once certified, the fragment can no longer be aborted by conflicting transactions."

The fragments are considered transactions, which have been passed to the remaining nodes within the cluster, certifying the fragmented transaction, then applying the write-sets. This means that once your large transaction has been certified or prioritized, all incoming connections that could possibly have a deadlock will need to wait until the transactions finishes.

Now, the verdict of deleting a huge table? 

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

Query OK, 12034538 rows affected (30 min 36.96 sec)

It finishes successfully without any failure!

How does it look like in the other nodes? In testnode12,

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE (PREPARED) 2050 sec

165631 lock struct(s), heap size 18735312, 12154883 row lock(s), undo log entries 12154883

MySQL thread id 898, OS thread handle 140266050008832, query id 341835 wsrep: preparing to commit write set(215510)

--------

FILE I/O

1 row in set (0.46 sec)



PAGER set to stdout

+----------------------------------+--------------+

| Variable_name                    | Value |

+----------------------------------+--------------+

| wsrep_flow_control_paused_ns     | 290832524304 |

| wsrep_flow_control_paused        | 0 |

| wsrep_flow_control_sent          | 0 |

| wsrep_flow_control_recv          | 0 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |

+----------------------------------+--------------+

8 rows in set (0.53 sec)



+----------+

| count(*) |

+----------+

|   120345 |

+----------+

1 row in set (0.88 sec)

It stops at a total of 120345 fragments, and if we do the math again on the last captured undo log entries (undo logs are the same from the master as well),

root@testnode12[sbtest]#> select 12154883/120345.0;                                                                                                                                                   +-------------------+

| 12154883/120345.0 |

+-------------------+

|          101.0003 |

+-------------------+

1 row in set (0.00 sec)

So we had a total of 120345 transactions being fragmented to delete 12034538 rows.

Once you're done using or enabling Stream Replication, do not forget to disable it as it will always log huge transactions and adds a lot of performance overhead to your cluster. To disable it, just run

root@testnode11[sbtest]#> set wsrep_trx_fragment_size=0;

Query OK, 0 rows affected (0.04 sec)

Conclusion

With Streaming Replication enabled, it's important that you are able to identify how large your fragment size can be and what unit you have to choose (bytes, rows, statements). 

It is also very important that you need to run it at session-level and of course identify when you only need to use Streaming Replication. 

While performing these tests, deleting a large number of rows to a huge table with Streaming Replication enabled has noticeably caused a high peak of disk utilization and CPU utilization. The RAM was more stable, but this could due to the statement we performed is not highly a memory contention. 

It’s safe to say that Streaming Replication can cause performance bottlenecks when dealing with large records, so using it should be done with proper decision and care. 

Lastly, if you are using Streaming Replication, do not forget to always disable this once done on that current session to avoid unwanted problems.

 

Using Explain Analyze in MySQL 8

$
0
0
Explain Analyze in MySQL

Explain Analyze in MySQLIn MySQL 8.0.18 there is a new feature called Explain Analyze when for many years we mostly had only the traditional Explain. I know there are different formats, but those based on the same information just show it in a different format with some extra details.

But Explain Analyze is a different concept. It is actually going to run the query and measure execution time by using the new iterator executor for each step. That topic itself deserves its own blog post on how the new iterator executor works, and I will write a post about that as well. But if you cannot wait and you would like to read up, here are some links to some additional information: Iterator executor analytics queries, Volcano iterator design, and Volcano iterator semijoin.

In this post, we are going to focus on what Explain Analyze can give us.

As always let’s start with some testing. I have my test server with a sbtest1 table:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB AUTO_INCREMENT=262125 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

There are almost 1 million rows in it:

mysql> select count(*) from sbtest1;

+----------+
| count(*) |
+----------+
| 999999 |
+----------+
1 row in set (0.32 sec)

Traditional Explain

mysql> explain select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: range
possible_keys: idx3
key: idx3
key_len: 4
ref: NULL
rows: 493204
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Hopefully, most of you already know this output; we can see which table and index will be used and we can also see it is a range query and approximately it has to read 493204 Rows. With InnoDB we know that is just an estimation, it is not the real number. In the past, we had two options: either run the query and see the real number or run the query and check the handler statistics to get even more detailed information.

Now I am going to run the query and see the real row count:

mysql> select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
count(*): 625262
1 row in set (0.10 sec)

So the query has to read 625262 rows and takes 0.10s.  Let’s have a look at Explain Analyze.

Explain Analyze

mysql> explain analyze select count(*) from sbtest1 where k > 500000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=178.225..178.225 rows=1 loops=1)
-> Filter: (sbtest1.k > 500000) (cost=98896.53 rows=493204) (actual time=0.022..147.502 rows=625262 loops=1)
-> Index range scan on sbtest1 using idx3 (cost=98896.53 rows=493204) (actual time=0.021..96.488 rows=625262 loops=1)

1 row in set (0.18 sec)

It always uses the tree format and it took me some time to actually understand what all this information means. Unfortunately, the manual page does not really explain it.

Let me try to fill the gaps:

Index range scan on sbtest1 using idx3 –  this part is quite trivial, it says it is going to use a range scan on sbtest1 table and will use the idx3 index.

cost=98896.53 rows=493204 –  this is the cost and the same row number that traditional Explain gives us.

actual time=0.021..96.488 rows=625262 loops=1:

  • 0.021 – The time to return first row (Init + first Read) in milliseconds.
  • 96.488 – The time to return all rows (Init + all Read calls) in milliseconds.
  • rows=625262 – The number of rows returned by this iterator. Finally, it is the exact number.
  • loops=1 – The number of loops (number of Init calls).

The source for this information is here: Implement EXPLAIN ANALYZE.

We can see all this information with each step, which gives us great help and insights for query optimization.

But it is important to notice in the last step that the time is 178ms and it also reports 1 row in set (0.18 sec). So it says this query takes 0.18s! But the original query took only 0.10s.

The manual says:

This has naturally some overhead, but it seems this is only around 5–6% for an instrumented query
(entirely free for a non-instrumented query).

But with this very simple query, the overhead is already 80%, so we cannot really trust those numbers.

Let’s have a look on another query which takes more time, and see how this difference behaves:

mysql> select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k;
+----------+
| count(*) |
+----------+
| 77902613 |
+----------+
1 row in set (15.42 sec)

mysql> explain analyze select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=26501.232..26501.233 rows=1 loops=1)
-> Nested loop left join (cost=810799.35 rows=4550475) (actual time=0.036..22572.530 rows=77902616 loops=1)
-> Index scan on t1 using idx3 (cost=108280.56 rows=986408) (actual time=0.026..176.645 rows=999999 loops=1)
-> Index lookup on t2 using idx3 (k=t1.k) (cost=0.25 rows=5) (actual time=0.001..0.017 rows=78 loops=999999)

1 row in set (26.50 sec)

It reports that the query is 60% slower than in real life.

Just out of curiosity I tried Explain Analyze with a Stored Procedure, but it didn’t work. One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood.  (In Percona Server there are log_slow_sp_statements which will allow you to log the individual queries into a slow query log.)

I would love to see if Explain Analyze would run the procedure and then display the actual execution plan in a tree format. But maybe I am believing in a perfect world and I am too naive.

I also noticed some interesting behavior:

mysql> explain analyze select count(*) from sbtest1\G
*************************** 1. row ***************************
EXPLAIN: -> Count rows in sbtest1

1 row in set (0.13 sec)

If there isn’t a where condition, or group by, or anything else and just a simple count query, it does not give us the time for that step or how many rows were read or which index was used. I think here its a bit too minimalistic and some basic information would be good to see.

Conclusion

I think Explain Analyze is a great step in the right direction, and I am excited to see if in the future it gets even more features, but it should report more realistic times for execution time.

Setup 2 MySQL InnoDB Clusters on 2 DCs and link them for DR

$
0
0

This article is an update of a previous post explaining how to setup a second cluster on a second data center to be used as disaster recovery (or to run some off site queries, like long reports, etc..).

This new article covers also the CLONE plugin. Before you ask, CLONE plugin and Replication Channel Based Filters are only available in MySQL 8.0 ! It’s time to upgrade, MySQL 8 is Great !

Also, for DR only, a single MySQL instance acting as asynchronous replica is enough. But if for any reason you want to also have a HA cluster in the second data center, not using replication channel based filers is not the recommended solution and requires to hack with multiple MySQL Shell versions. You should forget about it.

So, if you plan to have 2 clusters, the best option is to setup the asynchronous link between the two clusters and use replication channel based filters. Please be warned that there is NO conflict detection between the two clusters and if you write on both conflicting data at the same time, you are looking for troubles !

So we have in DC1 the following cluster:

MySQL  mysql-dc1-1:33060+ ssl  JS > cluster.status()
 {
     "clusterName": "clusterDC1", 
     "defaultReplicaSet": {
         "name": "default", 
         "primary": "mysql-dc1-1:3306", 
         "ssl": "REQUIRED", 
         "status": "OK", 
         "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
         "topology": {
             "mysql-dc1-1:3306": {
                 "address": "mysql-dc1-1:3306", 
                 "mode": "R/W", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }, 
             "mysql-dc1-2:3306": {
                 "address": "mysql-dc1-2:3306", 
                 "mode": "R/O", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }, 
             "mysql-dc1-3:3306": {
                 "address": "mysql-dc1-3:3306", 
                 "mode": "R/O", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }
         }, 
         "topologyMode": "Single-Primary"
     }, 
     "groupInformationSourceMember": "mysql-dc1-1:3306"

And we would like to have similar cluster on the second DC:

  • clusterDC2
    • mysql-dc2-1
    • mysql-dc2-2
    • mysql-dc2-3

The first thing we will do is to CLONE the dataset from clusterDC1 to mysql-dc2-1. To do so, we will create an user with the required
privileges on the Primary-Master in DC1 :

SQL > CREATE USER 'repl'@'%' IDENTIFIED BY 'password' REQUIRE SSL;
SQL > GRANT REPLICATION SLAVE, BACKUP_ADMIN, CLONE_ADMIN 
      ON . TO 'repl'@'%';

I explained this process in this post.

Now in mysql-dc2-1 we will create a cluster after having done dba.configureInstance(…) (I usualy create a dedicated user to manage the cluster):

MySQL  mysql-dc2-1:33060+ ssl  JS > cluster2=dba.createCluster('clusterDC2')

From the user we created, we need to add the required privilege for CLONE:

SQL > GRANT CLONE_ADMIN ON *.*  TO clusteradmin;
SQL > SET GLOBAL clone_valid_donor_list='192.168.222.3:3306';

We can now stop Group Replication and start the CLONE process:

MySQL  mysql-dc2-1:33060+ ssl  SQL > stop group_replication;
MySQL  mysql-dc2-1:33060+ ssl  SQL > set global super_read_only=0;

MySQL  127.0.0.1:3306 ssl  SQL > CLONE INSTANCE FROM
                   repl@192.168.222.3:3306 IDENTIFIED BY 'password';

Now we have the same data on mysql-dc2-1 and we can recreate the cluster as it has been overrided by the CLONE process. After that, we will be able to join the other two servers of DC2:

MySQL mysql-dc2-1:33060+ ssl JS > cluster=dba.createCluster(‘clusterDC2’)

MySQL  mysql-dc2-1:33060+ ssl  JS > cluster=dba.createCluster('clusterDC2')
MySQL  mysql-dc2-1:33060+ ssl  JS > cluster.addInstance('clusteradmin@mysql-dc2-2')
MySQL  mysql-dc2-1:33060+ ssl  JS > cluster.addInstance('clusteradmin@mysql-dc2-3')

We finally have our second cluster in our Disaster Recover Data Center:

MySQL  mysql-dc2-1:33060+ ssl  JS > cluster.status()
 {
     "clusterName": "clusterDC2", 
     "defaultReplicaSet": {
         "name": "default", 
         "primary": "mysql-dc2-1:3306", 
         "ssl": "REQUIRED", 
         "status": "OK", 
         "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
         "topology": {
             "mysql-dc2-1:3306": {
                 "address": "mysql-dc2-1:3306", 
                 "mode": "R/W", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }, 
             "mysql-dc2-2:3306": {
                 "address": "mysql-dc2-2:3306", 
                 "mode": "R/O", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }, 
             "mysql-dc2-3:3306": {
                 "address": "mysql-dc2-3:3306", 
                 "mode": "R/O", 
                 "readReplicas": {}, 
                 "replicationLag": null, 
                 "role": "HA", 
                 "status": "ONLINE", 
                 "version": "8.0.18"
             }
         }, 
         "topologyMode": "Single-Primary"
     }, 
     "groupInformationSourceMember": "mysql-dc2-1:3306"
 }

Now, before creating the asynchronous link, we have to bootstrap the MySQL Router.

We have two options, we can setup the router that connect on DC1 in DC2 or let it in DC1. This is an illustration
of both options:

Option 1
Option 2

I have chosen option 2. On mysql-dc2-1, the Primary-Master of DC2, I bootstrap the router that connects on DC1:

[root@mysql-dc2-1 ~]# mysqlrouter --bootstrap clusteradmin@mysql-dc1-1 \
                      --user=mysqlrouter --conf-use-gr-notifications
 Please enter MySQL password for clusteradmin: 
 Bootstrapping system MySQL Router instance…
 Checking for old Router accounts
 No prior Router accounts found
 Creating mysql account 'mysql_router1_abv0q7dfatf6'@'%' for cluster management
 Storing account in keyring
 Adjusting permissions of generated files
 Creating configuration /etc/mysqlrouter/mysqlrouter.conf 
 MySQL Router configured for the InnoDB cluster 'clusterDC1'
 After this MySQL Router has been started with the generated configuration
 $ /etc/init.d/mysqlrouter restart
 or
     $ systemctl start mysqlrouter
 or
     $ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf
 the cluster 'clusterDC1' can be reached by connecting to:
 MySQL Classic protocol
 Read/Write Connections: localhost:6446
 Read/Only Connections:  localhost:6447 
 MySQL X protocol
 Read/Write Connections: localhost:64460
 Read/Only Connections:  localhost:64470 

And we start MySQL Router:

[root@mysql-dc2-1 ~]# systemctl start mysqlrouter

Now we can setup the asynchronous link between DC1 and DC2 where DC2 is the replica of DC1.

This asynchronous replication link will use a dedicated channel and filter out the metadata of the cluster:

SQL> CHANGE MASTER TO MASTER_HOST='localhost', MASTER_PORT=6446,
     MASTER_USER='repl', MASTER_PASSWORD='password',
     MASTER_AUTO_POSITION=1, MASTER_SSL=1, GET_MASTER_PUBLIC_KEY=1  
     FOR CHANNEL 'repl_from_dc1';

SQL> CHANGE REPLICATION FILTER REPLICATE_IGNORE_DB=(mysql_innodb_cluster_metadata) 
     FOR CHANNEL 'repl_from_dc1';

SQL> START SLAVE FOR CHANNEL 'repl_from_dc1';

And we can verify it:

MySQL  mysql-dc2-1:33060+ ssl  SQL > show slave status for channel 'repl_from_dc1'\G
 * 1. row *
                Slave_IO_State: Waiting for master to send event
                   Master_Host: localhost
                   Master_User: repl
                   Master_Port: 6446
                 Connect_Retry: 60
               Master_Log_File: binlog.000003
           Read_Master_Log_Pos: 26261
                Relay_Log_File: mysql-dc2-1-relay-bin-repl_from_dc1.000002
                 Relay_Log_Pos: 2107
         Relay_Master_Log_File: binlog.000003
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: mysql_innodb_cluster_metadata
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 26261
               Relay_Log_Space: 2327
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: Yes
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 3939750055
                   Master_UUID: b06dff8e-f9b1-11e9-8c43-080027b4d0f3
              Master_Info_File: mysql.slave_master_info
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
            Master_Retry_Count: 86400
                   Master_Bind: 
       Last_IO_Error_Timestamp: 
      Last_SQL_Error_Timestamp: 
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
            Retrieved_Gtid_Set: 5765e9ea-f9b4-11e9-9f0e-080027b4d0f3:39-43
             Executed_Gtid_Set: 5765e9ea-f9b4-11e9-9f0e-080027b4d0f3:1-43,
 ae94a19d-f9b8-11e9-aaf5-080027b4d0f3:1-36
                 Auto_Position: 1
          Replicate_Rewrite_DB: 
                  Channel_Name: repl_from_dc1
            Master_TLS_Version: 
        Master_public_key_path: 
         Get_master_public_key: 1
             Network_Namespace: 

Now if the Primary-Master in DC1 changes, the asynchronous replication link will still work and DC2 as it uses the MySQL Router to point the the new Primary-Master. However is the Primary Master on DC2 dies, you will have to manually configure and start asynchronous replication on the new Master. Such operation can be performed automatically but you need to use an external tool as explained in this post.

You can already bootstrap and start MySQL Router on DC1 and setup asynchronous replication but I would recommend to not start it and use it only if DC2 gets the writes in case of disaster.

Galera Cluster for MySQL 5.6.46 and MySQL 5.7.28 is GA

$
0
0

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.46 (release notes, download) and MySQL-wsrep 5.7.28 (release notes, download). There is no Galera replication library release this time, so please continue using the 3.28 version, implementing wsrep API version 25.

This release incorporates all changes to MySQL 5.6.46 and 5.7.28 respectively and can be considered an updated rebased version. It is worth noting that we will have some platforms reach end of life (EOL) status, notably OpenSUSE 13.2 and Ubuntu Trusty 14.04.

You can get the latest release of Galera Cluster from https://www.galeracluster.com. There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

A beginner’s guide to database deadlock

$
0
0

Introduction In this article, we are going to see how a deadlock can occur in a relational database system, and how Oracle, SQL Server, PostgreSQL, or MySQL recover from a deadlock situation. Database locking Relational database systems use various locks to guarantee transaction ACID properties. For instance, no matter what relational database system you are using, locks will always be acquired when modifying (e.g., UPDATE or DELETE) a certain table record. Without locking a row that was modified by a currently running transaction, Atomicity would be compromised. Using locking for controlling access... Read More

The post A beginner’s guide to database deadlock appeared first on Vlad Mihalcea.


Column Histograms on Percona Server and MySQL 8.0

$
0
0
MySQL Column HIstorgrams

MySQL Column HIstorgramsFrom time to time you may have experienced that MySQL was not able to find the best execution plan for a query. You felt the query should have been faster. You felt that something didn’t work, but you didn’t realize exactly what.

Maybe some of you did tests and discovered there was a better execution plan that MySQL wasn’t able to find (forcing the order of the tables with STRAIGHT_JOIN for example).

In this article, we’ll see a new interesting feature available on MySQL 8.0 as well as Percona Server for MySQL 8.0: the histogram-based statistics.

Today, we’ll see what a histogram is, how you can create and manage it, and how MySQL’s optimizer can use it.

Just for completeness, histogram statistics have been available on MariaDB since version 10.0.2, with a slightly different implementation. Anyway, what we’ll see here is related to Percona Server and MySQL 8.0 only.

 

What is a histogram

We can define a histogram as a good approximation of the data distribution of the values in a column.

Histogram-based statistics were introduced to give the optimizer more execution plans to investigate and solve a query. Until then, in some cases, the optimizer was not able to find out the best possible execution plan because non-indexed columns were ignored.

With histogram statistics, now the optimizer may have more options because also non-indexed columns can be considered. In some specific cases, a query can run faster than usual.

Let’s consider the following table to store departing times of the trains:

CREATE TABLE train_schedule(
id INT PRIMARY KEY,
train_code VARCHAR(10),
departure_station VARCHAR(100),
departure_time TIME);

We can assume that during peak hours, from 7 AM until 9 AM, there are more rows, and during the night hours we have very few rows.

Let’s take a look at the following two queries:

SELECT * FROM train_schedule WHERE departure_time BETWEEN '07:30:00' AND '09:15:00';
SELECT * FROM train_schedule WHERE departure_time BETWEEN '01:00:00' AND '03:00:00';

Without any kind of statistics, the optimizer assumes by default that the values in the departure_time column are evenly distributed, but they aren’t. In fact, the first query returns more rows because of this assumption.

Histograms were invented to provide to the optimizer a good estimation of the rows returned. This seems to be trivial for the simple queries we have seen so far. But let’s think now about having the same table involved in JOINs with other tables. In such a case, the number of rows returned can be very important for the optimizer to decide the order to consider the tables in the execution plan.

A good estimation of the rows returned gives the optimizer the capability to open the table in the first stages in case it returns few rows. This minimizes the total amount of rows for the final cartesian product. Then the query can run faster.

MySQL supports two different types of histograms: “singleton” and “equi-height”. Common for all histogram types is that they split the data set into a set of “buckets”, and MySQL automatically divides the values into the buckets and will also automatically decide what type of histogram to create.

Singleton histogram

  • one value per bucket
  • each bucket stores
    • value
    • cumulative frequency
  • well suited for equality and range conditions

Equi-height histogram

  • multiple values per bucket
  • each bucket stores
    • minimum value
    • maximum value
    • cumulative frequency
    • number of distinct values
  • not really equi-height: frequent values are in separated buckets
  • well suited for range conditions

How to use histograms

The histogram feature is available and enabled on the server, but not usable by the optimizer. Without an explicit creation, the optimizer works the same as usual and cannot get any benefit from the histogram-bases statistics.

There is some manual operation to do. Let’s see.

In the next examples, we’ll use the world sample database you can download from here: https://dev.mysql.com/doc/index-other.html

Let’s start executing a query joining two tables to find out all the languages spoken on the largest cities of the world, with more than 10 million people.

mysql> select city.name, countrylanguage.language from city join countrylanguage using(countrycode) where population>10000000; 
+-----------------+-----------+ 
| name            | language  | 
+-----------------+-----------+ 
| Mumbai (Bombay) | Asami     | 
| Mumbai (Bombay) | Bengali   | 
| Mumbai (Bombay) | Gujarati  | 
| Mumbai (Bombay) | Hindi     | 
| Mumbai (Bombay) | Kannada   | 
| Mumbai (Bombay) | Malajalam | 
| Mumbai (Bombay) | Marathi   | 
| Mumbai (Bombay) | Orija     | 
| Mumbai (Bombay) | Punjabi   | 
| Mumbai (Bombay) | Tamil     | 
| Mumbai (Bombay) | Telugu    | 
| Mumbai (Bombay) | Urdu      | 
+-----------------+-----------+ 
12 rows in set (0.04 sec)

The query takes 0.04 seconds. It’s not a lot, but consider that the database is very small. Use the BENCHMARK function to have more relevant response times if you like.

Let’s see the EXPLAIN:

mysql> explain select city.name, countrylanguage.language from city join countrylanguage using(countrycode) where population>10000000; 
+----+-------------+-----------------+------------+-------+---------------------+-------------+---------+-----------------------------------+------+----------+-------------+ 
| id | select_type | table           | partitions | type  | possible_keys       | key         | key_len | ref                               | rows | filtered | Extra       | 
+----+-------------+-----------------+------------+-------+---------------------+-------------+---------+-----------------------------------+------+----------+-------------+ 
| 1  | SIMPLE      | countrylanguage | NULL       | index | PRIMARY,CountryCode | CountryCode | 3       | NULL                              | 984  | 100.00   | Using index | 
| 1  | SIMPLE      | city            | NULL       | ref   | CountryCode         | CountryCode | 3       | world.countrylanguage.CountryCode | 18   | 33.33    | Using where | 
+----+-------------+-----------------+------------+-------+---------------------+-------------+---------+-----------------------------------+------+----------+-------------+

Indexes are used for both the tables and the estimated cartesian product has 984 * 18 = 17,712 rows.

Now generate the histogram on the Population column. It’s the only column used for filtering the data and it’s not indexed.

For that, we have to use the ANALYZE command:

mysql> ANALYZE TABLE city UPDATE HISTOGRAM ON population WITH 1024 BUCKETS; 
+------------+-----------+----------+-------------------------------------------------------+ 
| Table      | Op        | Msg_type | Msg_text                                              | 
+------------+-----------+----------+-------------------------------------------------------+ 
| world.city | histogram | status   | Histogram statistics created for column 'Population'. | 
+------------+-----------+----------+-------------------------------------------------------+

We have created a histogram using 1024 buckets. The number of buckets is not mandatory, and it can be any number from 1 to 1024. If omitted, the default value is 100.

The number of chunks affects the reliability of the statistics. The more distinct values you have, the more the chunks you need.

Let’s have a look now at the execution plan and execute the query again.

mysql> explain select city.name, countrylanguage.language from city join countrylanguage using(countrycode) where population>10000000;
+----+-------------+-----------------+------------+------+---------------------+-------------+---------+------------------------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys       | key         | key_len | ref                    | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------------+-------------+---------+------------------------+------+----------+-------------+
|  1 | SIMPLE      | city            | NULL       | ALL  | CountryCode         | NULL        | NULL    | NULL                   | 4188 |     0.06 | Using where |
|  1 | SIMPLE      | countrylanguage | NULL       | ref  | PRIMARY,CountryCode | CountryCode | 3       | world.city.CountryCode |  984 |   100.00 | Using index |
+----+-------------+-----------------+------------+------+---------------------+-------------+---------+------------------------+------+----------+-------------+

mysql> select city.name, countrylanguage.language from city join countrylanguage using(countrycode) where population>10000000;
+-----------------+-----------+
| name            | language  |
+-----------------+-----------+
| Mumbai (Bombay) | Asami     |
| Mumbai (Bombay) | Bengali   |
| Mumbai (Bombay) | Gujarati  |
| Mumbai (Bombay) | Hindi     |
| Mumbai (Bombay) | Kannada   |
| Mumbai (Bombay) | Malajalam |
| Mumbai (Bombay) | Marathi   |
| Mumbai (Bombay) | Orija     |
| Mumbai (Bombay) | Punjabi   |
| Mumbai (Bombay) | Tamil     |
| Mumbai (Bombay) | Telugu    |
| Mumbai (Bombay) | Urdu      |
+-----------------+-----------+
12 rows in set (0.00 sec)

The execution plan is different, and the query runs faster.

We can notice that the order of the tables is the opposite as before. Even if it requires a full scan, the city table is in the first stage. It’s because of the filtered value that is only 0.06. It means that only 0.06% of the rows returned by the full scan will be used to be joined with the following table. So, it’s only 4188 * 0.06% = 2.5 rows. In total, the estimated cartesian product is 2.5 * 984 = 2.460 rows. This is significantly lower than the previous execution and explains why the query is faster.

What we have seen sounds a little counterintuitive, doesn’t it? In fact, until MySQL 5.7, we were used to considering full scans as very bad in most cases. In our case, instead, forcing a full scan using a histogram statistic on a non-indexed column lets the query to get optimized. Awesome.

 

Where are the histogram statistics

Histogram statistics are stored in the column_statistics table in the data dictionary and are not directly accessible by the users. Instead the INFORMATION_SCHEMA.COLUMN_STATISTICS table, which is implemented as a view of the data dictionary, can be used for the same purpose.

Let’s see the statistics for our table.

mysql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM)  
    -> FROM information_schema.column_statistics  
    -> WHERE COLUMN_NAME = 'population'\G
*************************** 1. row ***************************
           SCHEMA_NAME: world
            TABLE_NAME: city
           COLUMN_NAME: Population
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      42,
      455,
      0.000980632507967639,
      4
    ],
    [
      503,
      682,
      0.001961265015935278,
      4
    ],
    [
      700,
      1137,
      0.0029418975239029173,
      4
    ],
...
...
    [
      8591309,
      9604900,
      0.9990193674920324,
      4
    ],
    [
      9696300,
      10500000,
      1.0,
      4
    ]
  ],
  "data-type": "int",
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2019-10-14 22:24:58.232254",
  "sampling-rate": 1.0,
  "histogram-type": "equi-height",
  "number-of-buckets-specified": 1024
}

We can see for any chunk the min and max values, the cumulative frequency, and the number of items. Also, we can see that MySQL decided to use an equi-height histogram.

Let’s try to generate a histogram on another table and column.

mysql> ANALYZE TABLE country UPDATE HISTOGRAM ON Region;
+---------------+-----------+----------+---------------------------------------------------+
| Table         | Op        | Msg_type | Msg_text                                          |
+---------------+-----------+----------+---------------------------------------------------+
| world.country | histogram | status   | Histogram statistics created for column 'Region'. |
+---------------+-----------+----------+---------------------------------------------------+
1 row in set (0.01 sec)

mysql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM)  FROM information_schema.column_statistics  WHERE COLUMN_NAME = 'Region'\G
*************************** 1. row ***************************
           SCHEMA_NAME: world
            TABLE_NAME: country
           COLUMN_NAME: Region
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      "base64:type254:QW50YXJjdGljYQ==",
      0.02092050209205021
    ],
    [
      "base64:type254:QXVzdHJhbGlhIGFuZCBOZXcgWmVhbGFuZA==",
      0.04184100418410042
    ],
    [
      "base64:type254:QmFsdGljIENvdW50cmllcw==",
      0.05439330543933054
    ],
    [
      "base64:type254:QnJpdGlzaCBJc2xhbmRz",
      0.06276150627615062
    ],
    [
      "base64:type254:Q2FyaWJiZWFu",
      0.1631799163179916
    ],
    [
      "base64:type254:Q2VudHJhbCBBZnJpY2E=",
      0.20083682008368198
    ],
    [
      "base64:type254:Q2VudHJhbCBBbWVyaWNh",
      0.23430962343096232
    ],
    [
      "base64:type254:RWFzdGVybiBBZnJpY2E=",
      0.3179916317991631
    ],
    [
      "base64:type254:RWFzdGVybiBBc2lh",
      0.35146443514644343
    ],
    [
      "base64:type254:RWFzdGVybiBFdXJvcGU=",
      0.39330543933054385
    ],
    [
      "base64:type254:TWVsYW5lc2lh",
      0.41422594142259406
    ],
    [
      "base64:type254:TWljcm9uZXNpYQ==",
      0.44351464435146437
    ],
    [
      "base64:type254:TWljcm9uZXNpYS9DYXJpYmJlYW4=",
      0.4476987447698744
    ],
    [
      "base64:type254:TWlkZGxlIEVhc3Q=",
      0.5230125523012552
    ],
    [
      "base64:type254:Tm9yZGljIENvdW50cmllcw==",
      0.5523012552301255
    ],
    [
      "base64:type254:Tm9ydGggQW1lcmljYQ==",
      0.5732217573221757
    ],
    [
      "base64:type254:Tm9ydGhlcm4gQWZyaWNh",
      0.602510460251046
    ],
    [
      "base64:type254:UG9seW5lc2lh",
      0.6443514644351465
    ],
    [
      "base64:type254:U291dGggQW1lcmljYQ==",
      0.7029288702928871
    ],
    [
      "base64:type254:U291dGhlYXN0IEFzaWE=",
      0.7489539748953975
    ],
    [
      "base64:type254:U291dGhlcm4gQWZyaWNh",
      0.7698744769874477
    ],
    [
      "base64:type254:U291dGhlcm4gYW5kIENlbnRyYWwgQXNpYQ==",
      0.8284518828451883
    ],
    [
      "base64:type254:U291dGhlcm4gRXVyb3Bl",
      0.891213389121339
    ],
    [
      "base64:type254:V2VzdGVybiBBZnJpY2E=",
      0.9623430962343097
    ],
    [
      "base64:type254:V2VzdGVybiBFdXJvcGU=",
      1.0
    ]
  ],
  "data-type": "string",
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2019-10-14 22:29:13.418582",
  "sampling-rate": 1.0,
  "histogram-type": "singleton",
  "number-of-buckets-specified": 100
}

In this case, a singleton histogram was generated.

Using the following query we can see more human-readable statistics.

mysql> SELECT SUBSTRING_INDEX(v, ':', -1) value, concat(round(c*100,1),'%') cumulfreq,         
    -> CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq   
    -> FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets','$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist  
    -> WHERE schema_name  = 'world' and table_name = 'country' and column_name = 'region';
+---------------------------+-----------+-------+
| value                     | cumulfreq | freq  |
+---------------------------+-----------+-------+
| Antarctica                | 2.1%      | 2.1%  |
| Australia and New Zealand | 4.2%      | 2.1%  |
| Baltic Countries          | 5.4%      | 1.3%  |
| British Islands           | 6.3%      | 0.8%  |
| Caribbean                 | 16.3%     | 10.0% |
| Central Africa            | 20.1%     | 3.8%  |
| Central America           | 23.4%     | 3.3%  |
| Eastern Africa            | 31.8%     | 8.4%  |
| Eastern Asia              | 35.1%     | 3.3%  |
| Eastern Europe            | 39.3%     | 4.2%  |
| Melanesia                 | 41.4%     | 2.1%  |
| Micronesia                | 44.4%     | 2.9%  |
| Micronesia/Caribbean      | 44.8%     | 0.4%  |
| Middle East               | 52.3%     | 7.5%  |
| Nordic Countries          | 55.2%     | 2.9%  |
| North America             | 57.3%     | 2.1%  |
| Northern Africa           | 60.3%     | 2.9%  |
| Polynesia                 | 64.4%     | 4.2%  |
| South America             | 70.3%     | 5.9%  |
| Southeast Asia            | 74.9%     | 4.6%  |
| Southern Africa           | 77.0%     | 2.1%  |
| Southern and Central Asia | 82.8%     | 5.9%  |
| Southern Europe           | 89.1%     | 6.3%  |
| Western Africa            | 96.2%     | 7.1%  |
| Western Europe            | 100.0%    | 3.8%  |
+---------------------------+-----------+-------+

 

Histogram maintenance

Histogram statistics are not automatically recalculated. If you have a table that is very frequently updated with a lot of INSERTs, UPDATEs, and DELETEs, the statistics can run out of date very soon. Having unreliable histograms can lead the optimizer to the wrong choice.

When you find a histogram was useful to optimize a query, you need to also have a scheduled plan to refresh the statistics from time to time, in particular after doing massive modifications to the table.

To refresh a histogram you just need to run the same ANALYZE command we have seen before.

To completely drop a histogram you may run the following:

ANALYZE TABLE city DROP HISTOGRAM ON population;

 

Sampling

The histogram_generation_max_mem_size system variable controls the maximum amount of memory available for histogram generation. The global and session values may be set at runtime.

If the estimated amount of data to be read into memory for histogram generation exceeds the limit defined by the variable, MySQL samples the data rather than reading all of it into memory. Sampling is evenly distributed over the entire table.

The default value is 20000000 but you can increase it in the case of a large column if you want more accurate statistics. For very large columns, pay attention not to increase the threshold more than the memory available in order to avoid excessive overhead or outage.

 

Conclusion

Histogram statistics are particularly useful for non-indexed columns, as shown in the example.

Execution plans that can rely on indexes are usually the best, but histograms can help in some edge cases or when creating a new index is a bad idea.

Since this is not an automatic feature, some manual testing is required to investigate if you really can get the benefit of a histogram. Also, the maintenance requires some scheduled and manual activity.

Use histograms if you really need them, but don’t abuse them since histograms on very large tables can consume a lot of memory.

Usually, the best candidates for a histogram are the columns with:

  • values that do not change much over time
  • low cardinality values
  • uneven distribution

Install Percona Server 8.0, test and enjoy the histograms.

 

Further reading on the same topic: Billion Goods in Few Categories – How Histograms Save a Life?

 

MySQL is Ready for Fedora 31

$
0
0
Fedora 31 is out today; another rev on one of the most popular community Linux distros out there. As usual, we support the latest Fedora from day one, and we have added the following MySQL products to our official MySQL yum repos: MySQL Server (8.0.18 and 5.7.28) Connector C++ 8.0.18 Connector Python 8.0.18 Connector ODBC […]

5th Mydbops Database Meetup (Saturday, 2nd of Nov. 2019)

$
0
0

We are fueled and energized by the inquisitiveness and eagerness of Database Administrators as a community, reminding us of conducting the next edition of Mydbops Database Meetup. The regular attendees are now looking forward to this quarterly meetup organized by Mydbops IT Solutions, for the benefit of its participants, with the latest hands-on knowledge being shared by the practitioners themselves.

This time we are back in to Diamond District as the venue for this edition of 5th Mydbops Database Meetup, even though, the place of the meeting will be in a different Tower within the same campus at Gojek Tech, Diamond District, 4th Floor, Tower ‘B’, HAL Road, Bangalore – 560 008.

We have two eminent external speakers this time in Mr. Shiv Iyer who is a longtime (approx. 17 years) MySQL expert with core expertise in performance, scalability, high availability and database reliability engineering, Mr. Shiv Iyer is currently the founder and principal of MinervaDB, a boutique-private-label consulting, support, remote DBA and education services provider for MySQL, MariaDB, Percona Server, MyRocks and Click House with over 100 customers worldwide. Mr. Shiv Iyer, in the past worked for companies like MySQL AB, SUN Microsystems, AOL, eBay, PayPal, PalominoDB and Percona. Shiv also is a frequent speaker in open source conferences worldwide.

The next external speaker for the 5th edition of Mydbops Database Meetup is Mr.Gopinath Karangula MySQL DBA, RHCSA, RHCE, ERP Production Support engineer, Subverision Version Administration, Configuration management, ANT build and release, QA, sanity test of Opentaps ERP Application releases and its administration.  He is part of the Amadeus Labs in Bangalore.

From Mydbops, we have its star speakers during this community session. Mr.Vinoth Kanna, who recently gave impressive and informative speech to the Open Source Community, will be the sharing his thoughts along with Mr. Manosh Malai, who is an avid Tech Speaker and an open source enthusiast.

From Left to Right – Mr.Shiv Iyer, Mr.Manosh Malai, Mr.Vinoth Kanna, Mr.Gopinath Karangula

Meetup begins by 9 am in the above mentioned venue with registration process for all pre-registrants as well as walk-ins for the meetup. We do have a networking session in-between, in order to ensure the community is truly getting to know each other personally, through this platform of Mydbops Database Meetup (MDM).

MDM sessions are more interactive with the scheduled Q&A sessions after every speaker completes his knowledge sharing.

Since it is a community program, we gather valuable feed backs from its participants, in order to keep it continually improving for the better usage of the platform.

Register yourself for free by visiting us at – https://bit.ly/2q1tcNO

You are invited to this session, if you are a Database Enthusiast and are interested in gaining a first-hand-industry-relevant information with regard to Database Administration as a Technology.

Killing certain connections to a MySQL database

$
0
0
Maintenance of databases or servers is quite often performed by database administrators at night. But these routines sometimes get blocked by long-running queries or applications that hang onto locks much longer than expected. Regularly, priority is given to the application and maintenance routines are often canceled in order not to interfere with the application. But […]

Understanding Hash Joins in MySQL 8

$
0
0
hash joins mysql

hash joins mysqlIn MySQL 8.0.18 there is a new feature called Hash Joins, and I wanted to see how it works and in which situations it can help us. Here you can find a nice detailed explanation about how it works under the hood.

The high-level basics are the following: if there is a join, it will create an in-memory hash table based on one of the tables and will read the other table row by row, calculate a hash, and do a lookup on the in-memory hash table.

Great, but does this give us any performance benefits?

First of all, this only works on fields that are not indexed, so that is an immediate table scan and we usually do not recommend doing joins without indexes because it is slow. Here is where Hash Joins in MySQL can help because it will use an in-memory hash table instead of Nested Loop.

Let’s do some tests and see. First I created the following tables:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

I have inserted 131072 random rows into both tables.

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 131072   |
+----------+

First test – Hash Joins

Run a join based on c2 which is not indexed.

mysql> explain format=tree select count(*) from t1 join t2 on t1.c2 = t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=1728488704)
-> Table scan on t2 (cost=0.01 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

We have to use explain format=tree to see if Hash Join will be used or not, as normal explain still says it is going to be a Nested Loop, which I think it is very misleading. I have already filed a bug report because of this and in the ticket, you can see some comments from developers saying:

The solution is to stop using traditional EXPLAIN (it will eventually go away).

So this is not going to be fixed in traditional explain and we should start using the new way.

Back to the query; we can see it is going to use Hash Join for this query, but how fast is it?

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (0.73 sec)

0.73s for a more than 17m rows join table. Looks promising.

Second Test – Non-Hash Joins

We can disable it with an optimizer switch or optimizer hint.

mysql> select /*+ NO_HASH_JOIN (t1,t2) */ count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (13 min 36.36 sec)

Now the same query takes more than 13 minutes. That is a huge difference and we can see Hash Join helps a lot here.

Third Test – Joins Based on Indexes

Let’s create indexes and see how fast a join based on indexes is.

create index idx_c2 on t1(c2);
create index idx_c2 on t2(c2);

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (2.63 sec)

2.6s
  Hash Join is even faster than the Index-based join in this case.

However, I was able to force the optimizer to use Hash Joins even if an index is available by using ignore index:

mysql> explain format=tree select count(*) from t1 ignore index (idx_c2) join t2 ignore index (idx_c2) on t1.c2 = t2.c2 where t1.c2=t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=17336898)
-> Table scan on t2 (cost=0.00 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

I still think it would be nice if I can tell the optimizer with a hint to use Hash Joins even if an index is available, so we do not have to ignore indexes on all the tables. I have created a feature request for this.

However, if you read my first bug report carefully you can see a comment from a MySQL developer which indicates this might not be necessary:

BNL (Block Nested-Loop) will also go away entirely at some point, at which point this hint will be ignored.

That could mean they are planning to remove BNL joins in the future and maybe replace it with Hash join.

Limitations

We can see Hash Join can be powerful, but there are limitations:

  • As I mentioned it only works on columns that do not have indexes (or you have to ignore them).
  • It only works with equi-join conditions.
  • It does not work with LEFT or RIGHT JOIN.

I would like to see a status metric as well to monitor how many times Hash Join was used, and for this, I filled another feature request.

Conclusion

Hash Join seems a very powerful new join option, and we should keep an eye on this because I would not be surprised if we get some other features in the future as well. In theory, it would be able to do Left and Right joins as well and as we can see in the comments on the bug report that Oracle has plans for it in the future.

Viewing all 18800 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>