How to Deploy and using MySQL InnoDB Replica Set in Production ?

February 18, 2020, 9:06 am

≫ Next: ProxySQL Config file creation | Backup solution

≪ Previous: MySQL InnoDB Cluster Replication via Router to Slave Single Node using SSL

How to Deploy MySQL InnoDB Replica Set in Production?

Before i talk about Deployment process of MySQL InnoDB Replica Set , it is more important to know below details:-

What is MySQL InnoDB Replica Set?
What is prerequisite and limitation of using MySQL Replica Set?
In what kind of scenarios MySQL Replica Set is not recommended.
How to configure and deploy MySQL Replica Set- (step by step guide )
How to use InnoDB Replica Set?
What if Primary goes down? Does select query re-routed to another server?
What if Secondary goes down while executing select queries?

§ I will answer these all question in this blog.

What is Replica Set ?

MySQL InnoDB ReplicaSet a quick and easy way to get MySQL replication(Master-Slave), making it well suited to scaling out reads, and provides manual failover capabilities in use cases that do not require the high availability offered by MySQL InnoDB cluster.

Suppose you have one server is running for deriving workloads and you have to bring high availability in place for an application, basic says in MySQL to achieve high availability you require minimum 02 MySQL Server running in two different host.

And to set up link between these two host until earlier we have to prepare and qualify server to be part of HA, which requires you must know basis of MySQL BUT from MySQL 8.0.19 you don’t have to spend time on preparation and qualification and doing configuration level changes , MySQL InnoDB REPLICA SET makes your JOB AUTOMATED.

MySQL Replica Set is set of three components which is

· MySQL Shell

· MySQL Router

· Set of MySQL Servers(min no of server – 02)

It works only with Single Primary and multiple secondary server, which is in ASYNC mode.

MySQL Shell includes AdminAPI, which enables you to easily configure, administrator, and deploy a group of MySQL Servers.

More Info:- https://dev.mysql.com/doc/refman/8.0/en/mysql-shell-userguide.html

MySQL Router which is part of Replica Set and is lightweight middleware that provides transparent routing between your application and back-end MySQL Servers. Purpose is to serve R/W request to primary instance through port 6446 and R/O request to multiple primary instance through port 6447.

It is always recommended to Install MySQL Router into app server because of below reasons

· app is the one who has to send request to .

Application------->Router------->List of MySQL Servers.

· To decrease network latency

More Info:- https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-general.html

What is prerequisite and limitation of using MySQL Replica Set?

§ Manual Failover.

§ No Multi Primary Topology.

§ All Secondary members replicate from primary.

§ GTID based.

§ All MySQL Server version 8.0.19.

§ Rows based replication supported.

§ Replication Filter is not supported.

§ Replica Set must be managed by MySQL Shell.

§ Try to always use MySQL Cloning over Incremental Recovery as Recovery Method.

More Limitations:-

https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-replicasets-introduction.html

In what kind of scenarios MySQL Replica Set is Recommended ?

Below are top Features which makes life of DBA simple:-

ü To scale Read workloads.

ü Manual failover in event of primary node goes down.

ü Useful where we can compromise RPO/RTO time.

ü MySQL Shell Automatically configures users and Replication.

ü Easy to deploy without editing into my.cnf/my.ini file.

ü Not to spend time on Backup àRestore to provision new node , MySQL CLONE feature in-built which will save a lot time to bring another server for replication. More on Cloning:- https://mysqlserverteam.com/clone-create-mysql-instance-replica/

ü Integrated MySQL Router Load balancing .

ü Easy to getting started into MySQL high availability for all tier type applications.

How to configure and deploy MySQL Replica Set

Step by step guide to deploy MySQL Replica Set in Production

In this tutorial I will use two machine where MySQL is running

Machine 01:- 10.0.10.33

Machine 02:- 10.0.10.38

Make sure below software is installed:-

1. Mysql Server 8.0.19

2. MySQL Shell

3. MySQL Router. (it can install on either MySQL Server or Application Server which is Recommended).

Step 1:- Configure Machine to participate into InnoDB Replica Set

##In Machine 01

mysqlsh

shell.connect("root@10.0.10.33:3306");

Creating a session to 'root@10.0.10.33:3306'

Please provide the password for 'root@10.0.10.33:3306': ********

Save password for 'root@10.0.10.33:3306'? [Y]es/[N]o/Ne[v]er (default No): Y

Fetching schema names for autocompletion... Press ^C to stop.

Your MySQL connection id is 13

Server version: 8.0.19-commercial MySQL Enterprise Server - Commercial

No default schema selected; type \use <schema> to set one.

<ClassicSession:root@10.0.10.33:3306>

MySQL 10.0.10.33:3306 ssl JS >

dba.configureReplicaSetInstance("root@10.0.10.33:3306",{clusterAdmin: "'rsadmin'@'10.0.10.33%'"});

Configuring local MySQL instance listening at port 3306 for use in an InnoDB ReplicaSet...

This instance reports its own address as Workshop-33:3306

Clients and other cluster members will communicate with it through this address by default. If this is not correct, the report_host MySQL system variable should be changed.

Password for new account: ********

Confirm password: ********

NOTE: Some configuration options need to be fixed:

+--------------------------+---------------+----------------+--------------------------------------------------+

+--------------------------+---------------+----------------+--------------------------------------------------+

| enforce_gtid_consistency | OFF | ON | Update read-only variable and restart the server |

| gtid_mode | OFF | ON | Update read-only variable and restart the server |

| server_id | 1 | <unique ID> | Update read-only variable and restart the server |

+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.

Do you want to perform the required configuration changes? [y/n]: y

Do you want to restart the instance after configuring it? [y/n]: y

Cluster admin user 'rsadmin'@'10.0.10.33%' created.

Configuring instance...

The instance 'Workshop-33:3306' was configured to be used in an InnoDB ReplicaSet.

Restarting MySQL...

NOTE: MySQL server at Workshop-33:3306 was restarted.

##In Machine 2

mysqlsh

shell.connect("root@10.0.10.38:3306");

Creating a session to 'root@10.0.10.38:3306'

Please provide the password for 'root@10.0.10.38:3306': ********

Save password for 'root@10.0.10.38:3306'? [Y]es/[N]o/Ne[v]er (default No): Y

Fetching schema names for autocompletion... Press ^C to stop.

Your MySQL connection id is 10

Server version: 8.0.19-commercial MySQL Enterprise Server - Commercial

No default schema selected; type \use <schema> to set one.

<ClassicSession:root@10.0.10.38:3306>

dba.configureReplicaSetInstance("root@10.0.10.38:3306",{clusterAdmin: "'rsadmin'@'10.0.10.38%'"});

Configuring local MySQL instance listening at port 3306 for use in an InnoDB ReplicaSet...

This instance reports its own address as Workshop-38:3306

Clients and other cluster members will communicate with it through this address by default. If this is not correct, the report_host MySQL system variable should be changed.

Password for new account: ********

Confirm password: ********

NOTE: Some configuration options need to be fixed:

+--------------------------+---------------+----------------+--------------------------------------------------+

+--------------------------+---------------+----------------+--------------------------------------------------+

| enforce_gtid_consistency | OFF | ON | Update read-only variable and restart the server |

| gtid_mode | OFF | ON | Update read-only variable and restart the server |

| server_id | 1 | <unique ID> | Update read-only variable and restart the server |

+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.

Do you want to perform the required configuration changes? [y/n]: y

Do you want to restart the instance after configuring it? [y/n]: y

Cluster admin user 'rsadmin'@'10.0.10.38%' created.

Configuring instance...

The instance 'Workshop-38:3306' was configured to be used in an InnoDB ReplicaSet.

Restarting MySQL...

NOTE: MySQL server at Workshop-38:3306 was restarted.

MySQL 10.0.10.38:3306 ssl JS >

Step 2:- Create Replica Set and Add database node to form Replica Set.

##Connect to Machine 01 :-

mysqlsh

shell.connect("root@10.0.10.33:3306");

var rs = dba.createReplicaSet("MyReplicatSet")

A new replicaset with instance 'Workshop-33:3306' will be created.

* Checking MySQL instance at Workshop-33:3306

This instance reports its own address as Workshop-33:3306

Workshop-33:3306: Instance configuration is suitable.

* Updating metadata...

ReplicaSet object successfully created for Workshop-33:3306.

Use rs.addInstance() to add more asynchronously replicated instances to this replicaset and rs.status() to check its status.

MySQL 10.0.10.33:3306 ssl JS > rs.addInstance("10.0.10.38:3306");

Adding instance to the replicaset...

* Performing validation checks

This instance reports its own address as Workshop-38:3306

Workshop-38:3306: Instance configuration is suitable.

* Checking async replication topology...

* Checking transaction state of the instance...

The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'Workshop-38:3306' with a physical snapshot from an existing replicaset member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

WARNING: It should be safe to rely on replication to incrementally recover the state of the new instance if you are sure all updates ever executed in the replicaset were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the replicaset or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.

Incremental state recovery was selected because it seems to be safely usable.

* Updating topology

** Configuring Workshop-38:3306 to replicate from Workshop-33:3306

** Waiting for new instance to synchronize with PRIMARY...

The instance 'Workshop-38:3306' was added to the replicaset and is replicating from Workshop-33:3306.

MySQL 10.0.10.33:3306 ssl JS >

rs.status();

{

"replicaSet": {

"name": "ReplicatSet",

"primary": "Workshop-38:3306",

"status": "AVAILABLE",

"statusText": "All instances available.",

"topology": {

"10.0.10.39:3306": {

"address": "10.0.10.39:3306",

"instanceRole": "SECONDARY",

"mode": "R/O",

"replication": {

"applierStatus": "APPLIED_ALL",

"applierThreadState": "Slave has read all relay log; waiting for more updates",

"receiverStatus": "ON",

"receiverThreadState": "Waiting for master to send event",

"replicationLag": null

"status": "ONLINE"

"Workshop-38:3306": {

"address": "Workshop-38:3306",

"instanceRole": "PRIMARY",

"mode": "R/W",

"status": "ONLINE"

}

"type": "ASYNC"

}

Step 3:- Configure Router to talk from App to Replica Set.

mysqlrouter --force --user=root --bootstrap root@10.0.10.38:3306 --directory myrouter

#In Case Router from Remote Machine:-cluster in 10.0.10.14

mysqlrouter --bootstrap root@10.0.10.14:3310 --directory myrouter

Step 4: Start Router

myrouter/start.sh

Step 5: Using Replica Set

mysqlsh

MySQL JS>

shell.connect("root@127.0.0.1:6446");

\sql

SQL>SELECT * FROM performance_schema.replication_group_members;

CREATE DATABASE sales;USE sales;

CREATE TABLE if not exists sales.employee(empid int primary key auto_increment,empname varchar(100),salary int,deptid int);

INSERT sales.employee(empname,salary,deptid) values('Ram',1000,10);

INSERT sales.employee(empname,salary,deptid) values('Raja',2000,10);

INSERT sales.employee(empname,salary,deptid) values('Sita',3000,20);

SELECT * FROM sales.employee;

Connect Router to another machine to verify changes.

mysqlsh

JS>shell.connect("root@127.0.0.1:6447");

\sql

SQL>SELECT * FROM sales.employee;

INSERT sales.employee values(100,'Ram',1000,10);

<Error> because this machine is not allowed to execute DML,DDL statements.>

##Create Disaster

#service mysqld stop

RS1= dba.getReplicaSet()

RS1.status();

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

ReplicaSet.status: Failed to execute query on Metadata server 10.0.10.38:3306: Lost connection to MySQL server during query (MySQL Error 2013)

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

ReplicaSet.status: The Metadata is inaccessible (MetadataError)

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

ReplicaSet.status: The Metadata is inaccessible (MetadataError)

MySQL 10.0.10.38:3306 ssl JS >

MySQL-JS>shell.connect("root@localhost:6446");

Creating a session to 'root@10.0.10.38:6446'

Please provide the password for 'root@10.0.10.38:6446': ********

Shell.connect: Can't connect to remote MySQL server for client connected to '0.0.0.0:6446' (MySQL Error 2003)

#service mysqld start

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

ReplicaSet.status: The Metadata is inaccessible (MetadataError)

MySQL 10.0.10.38:3306 ssl JS > RS1=dba.getReplicaSet()

You are connected to a member of replicaset 'ReplicatSet'.

<ReplicaSet:ReplicatSet>

RS1=dba.getReplicaSet()

RS1.status()

##Again Connect to Router to send the traffic

mysqlsh

shell.connect("root@localhost:6447");

\sql

SQL>SELECT * FROM sales.employee;

Scenario#1 Assume primary goes down :and if you run

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

Error :- ReplicaSet.status: The Metadata is inaccessible (MetadataError)

MySQL 10.0.10.38:3306 ssl JS >

Now Primary machine UP and if you run

MySQL 10.0.10.38:3306 ssl JS > RS1.status()

ReplicaSet.status: The Metadata is inaccessible (MetadataError)

>>It not get refreshed.

Fix :-

RS1= dba.getReplicaSet()

RS1.status();

Scenario #02

Create Disaster # What if Primary Node Fails while executing below query from application

while [ 1 ]; do sleep 1; mysql -h127.0.0.1 -uroot -p123456 -P6446 -e " INSERT sales.employee(empname,salary,deptid) values('Ram',1000,10); select count(*) from sales.employee"; done

\JS

#Stop Primary MySQL Instance

service mysqld stop

You can see Insert Query is stopped working , Ended with ERROR

Now Lets execute only SELECT query let see what happens... since primary node goes down which means mysql router will stopped send any query into 6446 BUT router has another port OPEN for sending ONLY SELECT query. which meant router will use port 6447 to send select query.

see below

Let's re-execute same query with only SELECT query connecting to R/O port 6447

while [ 1 ]; do sleep 1; mysql -h127.0.0.1 -uroot -p123456 -P6447 -e " Select count(*) from sales.employee"; done

You are able to access another machine which is Replica (10.0.0.38).

Now , Let's Re-connect to Primary Node(10.0.10.33) what will happen? it will work or not?...

Which means that even if primary node goes down and second replicas are alive then select query will work

select @@hostname; --> 10.0.10.38

Scenario #03

Create Disaster # What if Secondary Node Fails…

#Stop MySQL Instance

10.0.10.38$service mysqld stop

Primary will still works even though Secondary node goes down… that’s by design of MySQL Replication.

Now since secondary node goes down let’s connect to 6447 and send only SELECT query

while [ 1 ]; do sleep 1; mysql -h127.0.0.1 -uroot -p123456 -P6447 -e " select count(*) from sales.employee"; done

What will happen?

Even though Secondary Node goes down , MySQL Router will re-routing to Primary server and return results as you see in above image.

Re-confirm:-

Can you Observe one important observation? why port 6447 is executing R/W query?

When we execute R/W and R/O on 6447 port Router does routing to Primary Node 6446.

Because as per documentation:-

When you use MySQL Router with a replica set, be aware that:

· The read-write port of MySQL Router directs client connections to the primary instance of the replica set

· The read-only port of MySQL Router direct client connections to a secondary instance of the replica set, although it could also direct them to the primary

Please try this brand new features to set up MySQL Replication with the help of MySQL Shell.

Want to Know more?

https://dev.mysql.com/doc/refman/8.0/en/working-with-replicasets.html

https://mysqlserverteam.com/introducing-mysql-innodb-replicaset/

↧

ProxySQL Config file creation | Backup solution

February 18, 2020, 9:12 am

≫ Next: The state of Orchestrator, 2020 (spoiler: healthy)

≪ Previous: How to Deploy and using MySQL InnoDB Replica Set in Production ?

We are well aware that ProxySQL is one of the powerful SQL aware proxy for MySQL. The ProxySQL configuration is flexible and the maximum part of configurations can be done with the ProxySQL client itself.

The latest ProxySQL release ( 2.0.9 ) has few impressive features like “SQL injection engine, Firewall whitelist, Config file generate” . In this blog I am going to explain, how to generate the ProxySQL config file using the proxySQL client .

Why configuration file ?

Backup solution
Helpful for Ansible deployments in multipul environments

There are two important commands involved in the ProxySQL config file generation.

Print the config file text in ProxySQL client itself ( like query output )
Export the configurations in separate file

Print the config file text in ProxySQL client ( like query output ) :

cmd : SELECT CONFIG FILE ;

Export the configurations in separate file :

cmd : SELECT CONFIG INTO OUTFILE /path/config

Below is the bash script , which will helps to backup the ProxySQL configuration . It can be schedule in the cron with convenient time .

Script :

[root@ip-172-31-8-156 ProxySQL]# cat backup.sh
#!/bin/sh

#variable

backup_path="/var/lib/ProxySQL_backup/data"
user=admin
pass=admin
port=6032
host=127.0.0.1
back_name=ProxySQL_backup_$(date -u +%Y-%m-%dT%H-%M-%S)
log_path="/var/lib/ProxySQL_backup/log"

#live_check

ProxySQL_livecheck() {
if [[ $(pgrep proxysql) ]]; then 
      ProxySQL_Backup 
else 
      echo "Backup ( $back_name ) failed" >> /var/lib/ProxySQL_backup/log/backup.err 
      exit 
fi
}

#backup

ProxySQL_Backup() {
mysql -u$user -p$pass -P$port -h$host -e "select config into outfile $backup_path/$back_name" 

echo "Backup ( $back_name ) completed" >> /var/lib/ProxySQL_backup/log/backup.log
}

#call
ProxySQL_livecheck

Thanks !!!

↧

The state of Orchestrator, 2020 (spoiler: healthy)

February 18, 2020, 11:14 am

≫ Next: Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

≪ Previous: ProxySQL Config file creation | Backup solution

This post serves as a pointer to my previous announcement about The state of Orchestrator, 2020.

Thank you to Tom Krouper who applied his operational engineer expertise to content publishing problems.

↧

Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

February 19, 2020, 6:37 am

≫ Next: MySQL Encryption: How Master Key Rotation Works

≪ Previous: The state of Orchestrator, 2020 (spoiler: healthy)

Building a Kubernetes Operator for Percona XtraDB Cluster

This talk covers some of the challenges we sought to address by creating a Kubernetes Operator for Percona XtraDB Cluster, as well as a look into the current state of the Operator, a brief demonstration of its capabilities, and a preview of the roadmap for the remainder of the year. Find out how you can deploy a 3-node PXC cluster in under five minutes and handle providing self-service databases on the cloud in a cloud-vendor agnostic way. You’ll have the opportunity to ask the Product Manager questions and provide feedback on what challenges you’d like us to solve in the Kubernetes landscape.

Please join Percona Product Manager Tyler Duzan on Wednesday, February 26, 2020, at 1 pm EST for his webinar “Building a Kubernetes Operator for Percona XtraDB Cluster”.

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

↧

MySQL Encryption: How Master Key Rotation Works

February 19, 2020, 9:40 am

≫ Next: Make It Smarter: Tuning MySQL Client Request Routing for Tungsten Connector

≪ Previous: Webinar 2/26: Building a Kubernetes Operator for Percona XtraDB Cluster

MySQL How Master Key Rotation Works In the last blog post of this series, we discussed in detail how Master Key encryption works. In this post, based on what we already know about Master Key encryption, we look into how Master Key rotation works.

The idea behind Master Key rotation is that we want to generate a new Master Key and use this new Master Key to re-encrypt the tablespace key (stored in tablespace’s header).

Let’s remind ourselves what a Master Key encryption header looks like (it is located in tablespace’s header):

From the previous blog post, we know that when a server starts it goes through all encrypted tablespaces’ encryption headers. During that, it remembers the highest KEY ID it read from all the encrypted tablespaces. For instance, if we have three tables with KEY_ID = 3 and one table with KEY ID = 4, it means that the highest key ID we found in the server is 4. Let’s call this highest KEY ID – MAX KEY ID.

How Master Key Rotation Works, Step by Step:

1. User issues ALTER INNODB MASTER KEY;

2. The server asks keyring to generate a new Master Key with server’s UUID and KEY_ID being MAX KEY ID incremented by one. So we get INNODB_KEY-UUID-(MAX_KEY_ID+1). On successful Master Key generation, the MAX KEY ID is incremented by one (i.e. MAX_KEY_ID = MAX_KEY_ID + 1).

3. The server goes through all the Master Key encrypted tablespaces in the server and for each tablespace:

– encrypts tablespace key with the new Master Key
– updates key id to the new MAX KEY ID
– if UUID is different than the server’s UUID it gets set to the server’s UUID

As we know, the Master Key ID used to decrypt table is built of UUID and KEY ID read from the tablespace’s header. What we are doing now is updating this information in the tablespace’s encryption header, so the server would retrieve the correct Master Key when trying to decrypt the tablespace.

If we happen to have tablespaces coming from different places – like, for instance, retrieved from different backups – those tablespaces may be using different Master Keys. All those Master Keys would need to be retrieved from keyring on server startup. This might make the server’s startup slow, especially if we are using server-based keyring. With Master Key rotation, we re-encrypt tablespace keys with one – the same for all tablespaces – Master Key. Now the server needs to retrieve only one Master Key from Key server (for server-based keyring) on startup.

This is, of course, only a nice side effect – the main purpose why we do Master Key rotation is to make our server more secure. In case Master Key was somehow stolen from the keyring (for instance, from Vault Server) we can generate a new Master Key and re-encrypt the tablespaces keys, making the stolen key no longer valid. We are safe … almost.

In the previous blog post, I explained that once a decrypted tablespace key is stolen, a third-party can keep using it to decrypt our data – given that they have access to our disk. In case Master Key was stolen, and if the third-party had access to our encrypted data, they could use the stolen Master Key to decrypt the tablespace key and thus be able to decrypt the data. As we can see, Master Key rotation will not help us in that case. We will re-encrypt the tablespace key with the new Master Key, but the actual tablespace key used to encrypt/decrypt tablespace will remain the same; so “a hacker” can keep using it to decrypt the data. I previously hinted that Percona Server for MySQL has a way of doing actual re-encryption of tablespaces instead of just re-encrypting tablespace key. The feature is called encryption threads, however, at this point in time, it is still an experimental feature.

A case where Master Key rotation is helpful is when Master Key is stolen, but the attacker did not have a chance to use it and decrypt our tablespace keys.

↧

Make It Smarter: Tuning MySQL Client Request Routing for Tungsten Connector

February 20, 2020, 3:32 am

≫ Next: What to Check if MySQL Memory Utilisation is High

≪ Previous: MySQL Encryption: How Master Key Rotation Works

Overview

The Skinny

In this blog post we explore various options for tuning MySQL traffic routing in the Tungsten Connector for better control of the distribution.

A Tungsten Cluster relies upon the Tungsten Connector to route client requests to the master node or optionally to the slaves. The Connector makes decisions about where to route requests based on a number of factors.

This blog post will focus on the Load Balancer algorithms available via configuration that allow you to adjust the routing behavior of the Connector, along with ways to debug the Connector Load Balancer’s routing decisions.

The Question

Recently, a customer asked us:

How do I know which load balancer algorithm is in use by the Connector? And how do we enable debug logging for the Connector load balancer?

The Answers

Grep and Ye Shall Find

Let’s start off with the discovery process – how do I know which load balancer is in use for which QoS?

First, you may simply grep for the values in the router.properties file:

shell> grep dataSourceLoadBalancer $CONTINUENT_ROOT/tungsten/cluster-home/conf/router.properties

dataSourceLoadBalancer_RO_RELAXED=com.continuent.tungsten.router.resource.loadbalancer.MostAdvancedSlaveLoadBalancer
dataSourceLoadBalancer_RW_STRICT=com.continuent.tungsten.router.resource.loadbalancer.DefaultLoadBalancer
dataSourceLoadBalancer_RW_SESSION=com.continuent.tungsten.router.resource.loadbalancer.HighWaterSlaveLoadBalancer

You can also locate the maxAppliedLatency the same way, just use grep:

shell> grep 'maxAppliedLatency=' $CONTINUENT_ROOT/tungsten/cluster-home/conf/router.properties
maxAppliedLatency=-1

If your Connector is running in Proxy mode, you can also get the information via the Connector’s enhanced MySQL CLI:

shell> tpm connector
mysql> tungsten show variables like '%dataSourceLoadBalancer%';
+-------------------+-----------------------------------+-------------------------------+
| Variable_Type     | Variable_name                     | Value                         |
+-------------------+-----------------------------------+-------------------------------+
| router.properties | dataSourceLoadBalancer_RO_RELAXED | MostAdvancedSlaveLoadBalancer |
| router.properties | dataSourceLoadBalancer_RW_SESSION | HighWaterSlaveLoadBalancer    |
| router.properties | dataSourceLoadBalancer_RW_STRICT  | DefaultLoadBalancer           |
+-------------------+-----------------------------------+-------------------------------+
3 rows in set (0.01 sec)

Singing a Better Tune

How to Change the Connector’s Load Balancer Algorithm

The default Connector Load balancer in v6.x is the MostAdvancedSlaveLoadBalancer which selects the slave that has replicated the most events, by comparing data sources “high water” marks. If no slave is available, the master will be returned.

This means that even if a slave is behind the master, this Load Balancer will select the most advanced slave, while still eliminating any slave that is more than maxAppliedLatency seconds latent.

You may wish to use one of the other algorithms to adjust the behavior, such as round-robin or lowest-latency slave:

LOAD BALANCER	DEFAULT QOS	DESCRIPTION
DefaultLoadBalancer	`RW_STRICT`	Always selects the master data source
MostAdvancedSlaveLoadBalancer	`RO_RELAXED`	Selects the slave that has replicated the most events, by comparing data sources “high water” marks. If no slave is available, the master will be returned.
LowestLatencySlaveLoadBalancer		Selects the slave data source that has the lowest replication lag, or `appliedLatency` in ls -l within cctrl output. If no slave data source is eligible, the master data source will be selected.
RoundRobinSlaveLoadBalancer		Selects a slave in a round robin manner, by iterating through them using internal index. Returns the master if no slave is found online
HighWaterSlaveLoadBalancer	`RW_SESSION`	Given a session high water (usually the high water mark of the update event), selects the first slave that has higher or equal high water, or the master if no slave is online or has replicated the given session event. This is the default used when SmartScale is enabled.

To change the Connector load balancer, specify the property in the configuration, i.e to use Round Robin:

shell> vi /etc/tungsten/tungsten.ini

[alpha]
property=dataSourceLoadBalancer_RO_RELAXED=com.continuent.tungsten.router.resource.loadbalancer.RoundRobinSlaveLoadBalancer
...

shell> tpm update

Here are the docs for a closer look –
http://docs.continuent.com/tungsten-clustering-6.1/connector-routing-loadbalancers.html

The Nitty-Gritty

Debug Logging to the Rescue!

For in-depth debugging, enable trace logging for the Connector load balancer. Perform the procedure on one or more Connector nodes to see what the load balancer algorithm is thinking.

Warning
Enabling Connector debug logging will decrease performance dramatically. Disk writes will increase as will disk space consumption. Do not use in production environments unless instructed to do so by Continuent support. In any case, run in this mode for as short a period of time as possible – just long enough to gather the needed debug information. After that is done, disable debug logging.

To enable trace logging for the load balancer, edit the Connector configuration file tungsten-connector/conf/log4j.properties and uncomment two lines. For example:

shell> su - tungsten
shell> vi /opt/continuent/tungsten/tungsten-connector/conf/log4j.properties

Uncomment these two lines:

#log4j.logger.com.continuent.tungsten.router.resource.loadbalancer=debug, stdout
#log4j.additivity.com.continuent.tungsten.router.resource.loadbalancer=false

so they look like this:

log4j.logger.com.continuent.tungsten.router.resource.loadbalancer=debug, stdout
log4j.additivity.com.continuent.tungsten.router.resource.loadbalancer=false

Then signal the Connector to reread the config files:
shell> connector reconfigure

Warning
When disabling debug logging, DO NOT comment the lines out! Instead replace debug with info.

To disable debug logging, edit the Connector configuration file tungsten-connector/conf/log4j.properties and change the keyword debug to info.

For example, this is how it should look when the edit to disable is completed:

shell> vi /opt/continuent/tungsten/tungsten-connector/conf/log4j.properties

log4j.logger.com.continuent.tungsten.router.resource.loadbalancer=info, stdout
log4j.additivity.com.continuent.tungsten.router.resource.loadbalancer=false

shell> connector reconfigure

The Library

Please read the docs!

This is another Blog post about load balancing: How to use Round-Robin Load Balancing with the Tungsten Connector

Here are the Connector Load Balancer docs for a closer look –
http://docs.continuent.com/tungsten-clustering-6.1/connector-routing-loadbalancers.html

For additional understanding of maxAppliedLatency, please visit here: http://docs.continuent.com/tungsten-clustering-6.1/connector-routing-latency.html

For more information about the generalized debug procedures:
https://docs.continuent.com/tungsten-clustering-6.1/troubleshooting-support.html#troubleshooting-support-advanced
Scroll down to “Configuring Connector Debug Logging”

For more technical information about Tungsten clusters, please visit https://docs.continuent.com

Summary

The Wrap-Up

Clearly, there are many, many things to think about when it comes to Tungsten Connector tuning – this blog post barely scratches the surface of the subject. Remember, tuning is all about iterative effort over time!

Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit https://www.continuent.com/solutions

Want to learn more or run a POC? Contact us.

↧

What to Check if MySQL Memory Utilisation is High

February 21, 2020, 2:45 am

≫ Next: Influences leading to Asynchronous Programming Model in NDB Cluster

≪ Previous: Make It Smarter: Tuning MySQL Client Request Routing for Tungsten Connector

One of the key factors of a performant MySQL database server is having good memory allocation and utilization, especially when running it in a production environment. But how can you determine if the MySQL utilization is optimized? Is it reasonable to have high memory utilization or does it require fine tuning? What if I come up against a memory leak?

Let's cover these topics and show the things you can check in MySQL to determine traces of high memory utilization.

Memory Allocation in MySQL

Before we delve into the specific subject title, I'll just give a short information about how MySQL uses memory. Memory plays a significant resource for speed and efficiency when handling concurrent transactions and running big queries. Each thread in MySQL demands memory which is used to manage client connections, and these threads share the same base memory. Variables like thread_stack (stack for threads), net_buffer_length (for connection buffer and result buffer), or with max_allowed_packet where connection and result will dynamically enlarge up to this value when needed, are variables that do affect memory utilization. When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated. Query joins, query caches, sorting, table cache, table definitions do require memory in MySQL but these are attributed with system variables that you can configure and set.

In most cases, the memory-specific variables set for a configuration are targeted on a storage-based specific configuration such as MyISAM or InnoDB. When a mysqld instance spawns within the host system, MySQL allocates buffers and caches to improve performance of database operations based on the set values set on a specific configuration. For example, the most common variables every DBA will set in InnoDB are variables innodb_buffer_pool_size and innodb_buffer_pool_instances which are both related to buffer pool memory allocation that holds cached data for InnoDB tables. It's desirable if you have large memory and are expecting to handle big transactions by setting innodb_buffer_pool_instances to improve concurrency by dividing the buffer pool into multiple buffer pool instances.

While for MyISAM, you have to deal with key_buffer_size to handle the amount of memory that the key buffer will handle. MyISAM also allocates buffer for every concurrent threads which contains a table structure, column structures for each column, and a buffer of size 3 * N are allocated (where N is the maximum row length, not counting BLOB columns). MyISAM also maintains one extra row buffer for internal use.

MySQL also allocates memory for temporary tables unless it becomes too large (determined by tmp_table_size and max_heap_table_size). If you are using MEMORY tables and variable max_heap_table_size is set very high, this can also take a large memory since max_heap_table_size system variable determines how large a table can grow, and there is no conversion to on-disk format.

MySQL also has a Performance Schema which is a feature for monitoring MySQL activities at a low level. Once this is enabled, it dynamically allocates memory incrementally, scaling its memory use to actual server load, instead of allocating required memory during server startup. Once memory is allocated, it is not freed until the server is restarted.

MySQL can also be configured to allocate large areas of memory for its buffer pool if using Linux and if kernel is enabled for large page support, i.e. using HugePages.

What To Check Once MySQL Memory is High

Check Running Queries

It's very common for MySQL DBAs to touch base first what's going on with the running MySQL server. The most basic procedures are check processlist, check server status, and check the storage engine status. To do these things, basically, you have just to run the series of queries by logging in to MySQL. See below:

To view the running queries,

mysql> SHOW [FULL] PROCESSLIST;

Viewing the current processlist reveals queries that are running actively or even idle or sleeping processes. It is very important and is a significant routine to have a record of queries that are running. As noted on how MySQL allocates memory, running queries will utilize memory allocation and can drastically cause performance issues if not monitored.

View the MySQL server status variables,

mysql> SHOW SERVER STATUS\G

or filter specific variables like

mysql> SHOW SERVER STATUS WHERE variable_name IN ('<var1>', 'var2'...);

MySQL's status variables serve as your statistical information to grab metric data to determine how your MySQL performs by observing the counters given by the status values. There are certain values here which gives you a glance that impacts memory utilization. For example, checking the number of threads, the number of table caches, or the buffer pool usage,

...

| Created_tmp_disk_tables                 | 24240 |

| Created_tmp_tables                      | 334999 |

…

| Innodb_buffer_pool_pages_data           | 754         |

| Innodb_buffer_pool_bytes_data           | 12353536         |

...

| Innodb_buffer_pool_pages_dirty          | 6         |

| Innodb_buffer_pool_bytes_dirty          | 98304         |

| Innodb_buffer_pool_pages_flushed        | 30383         |

| Innodb_buffer_pool_pages_free           | 130289         |

…

| Open_table_definitions                  | 540 |

| Open_tables                             | 1024 |

| Opened_table_definitions                | 540 |

| Opened_tables                           | 700887 |

...

| Threads_connected                             | 5 |

...

| Threads_cached    | 2 |

| Threads_connected | 5     |

| Threads_created   | 7 |

| Threads_running   | 1 |

View the engine's monitor status, for example, InnoDB status

mysql> SHOW ENGINE INNODB STATUS\G

The InnoDB status also reveals the current status of transactions that the storage engine is processing. It gives you the heap size of a transaction, adaptive hash indexes revealing its buffer usage, or shows you the innodb buffer pool information just like the example below:

---TRANSACTION 10798819, ACTIVE 0 sec inserting, thread declared inside InnoDB 1201

mysql tables in use 1, locked 1

1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 8801

MySQL thread id 68481, OS thread handle 139953970235136, query id 681821 localhost root copy to tmp table

ALTER TABLE NewAddressCode2_2 ENGINE=INNODB



…

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 528, free list len 43894, seg size 44423, 1773 merges

merged operations:

 insert 63140, delete mark 0, delete 0

discarded operations:

 insert 0, delete mark 0, delete 0

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 637 buffer(s)

Hash table size 553193, node heap has 772 buffer(s)

Hash table size 553193, node heap has 1239 buffer(s)

Hash table size 553193, node heap has 2 buffer(s)

Hash table size 553193, node heap has 0 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

115320.41 hash searches/s, 10292.51 non-hash searches/s

...

----------------------

BUFFER POOL AND MEMORY

----------------------

Total large memory allocated 2235564032

Dictionary memory allocated 3227698

Internal hash tables (constant factor + variable factor)

    Adaptive hash index 78904768        (35404352 + 43500416)

    Page hash           277384 (buffer pool 0 only)

    Dictionary cache    12078786 (8851088 + 3227698)

    File system         1091824 (812272 + 279552)

    Lock system         5322504 (5313416 + 9088)

    Recovery system     0 (0 + 0)

Buffer pool size   131056

Buffer pool size, bytes 2147221504

Free buffers       8303

Database pages     120100

Old database pages 44172

Modified db pages  108784

Pending reads      0

Pending writes: LRU 2, flush list 342, single page 0

Pages made young 533709, not young 181962

3823.06 youngs/s, 1706.01 non-youngs/s

Pages read 4104, created 236572, written 441223

38.09 reads/s, 339.46 creates/s, 1805.87 writes/s

Buffer pool hit rate 1000 / 1000, young-making rate 12 / 1000 not 5 / 1000

Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s

LRU len: 120100, unzip_LRU len: 0

I/O sum[754560]:cur[8096], unzip sum[0]:cur[0]

…

Another thing to add, you can also use Performance Schema and sys schema for monitoring memory consumption and utilization by your MySQL server. By default, most instrumentations are disabled by default so there are manual things to do to use this.

Check for Swappiness

Either way, it's probable that MySQL is swapping out its memory to disk. This is oftentimes a very common situation especially when MySQL server and the underlying hardware is not set optimally in parallel to the expected requirements. There are certain cases that the demand of traffic has not been anticipated, memory could grow increasingly especially if bad queries are run causing to consume or utilize a lot of memory space causing degrading performance as data are picked on disk instead of on the buffer. To check for swappiness, just run freemem command or vmstat just like below,

[root@node1 ~]# free -m

              total        used free      shared buff/cache available

Mem:           3790 2754         121 202 915         584

Swap:          1535 39        1496

[root@node1 ~]# vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b swpd   free buff  cache si so    bi bo in cs us sy id wa st

 2  0 40232 124100      0 937072 2 3 194  1029 477 313 7 2 91 1  0

 0  0 40232 123912      0 937228 0 0   0 49 1247 704 13 3 84  0 0

 1  0 40232 124184      0 937212 0 0   0 35 751 478 6 1 93  0 0

 0  0 40232 123688      0 937228 0 0   0 15 736 487 5 1 94  0 0

 0  0 40232 123912      0 937220 0 0   3 74 1065 729 8 2 89  0 0

You may also check using procfs and gather information such as going to /proc/vmstat or /proc/meminfo.

Using Perf, gdb, and Valgrind with Massif

Using tools like perf, gdb, and valgrind helps you dig into a more advanced method of determining MySQL memory utilization. There are times that an interesting outcome becomes a mystery of solving memory consumption that leads to your bewilderment in MySQL. This turns in the need to have more skepticism and using these tools helps you investigate how MySQL is using handling memory from allocating it to utilizing it for processing transactions or processes. This is useful for example if you are observing MySQL is behaving abnormally that might cause bad configuration or could lead to a findings of memory leaks.

For example, using perf in MySQL reveals more information in a system level report:

[root@testnode5 ~]# perf report --input perf.data --stdio

# To display the perf.data header info, please use --header/--header-only options.

#

#

# Total Lost Samples: 0

#

# Samples: 54K of event 'cpu-clock'

# Event count (approx.): 13702000000

#

# Overhead  Command Shared Object        Symbol                                                                                                                                                                                             

# ........  ....... ...................  ...................................................................................................................................................................................................

#

    60.66%  mysqld [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore

     2.79%  mysqld   libc-2.17.so         [.] __memcpy_ssse3

     2.54%  mysqld   mysqld             [.] ha_key_cmp

     1.89%  mysqld   [vdso]             [.] __vdso_clock_gettime

     1.05%  mysqld   mysqld             [.] rec_get_offsets_func

     1.03%  mysqld   mysqld             [.] row_sel_field_store_in_mysql_format_func

     0.92%  mysqld   mysqld             [.] _mi_rec_pack

     0.91%  mysqld   [kernel.kallsyms]    [k] finish_task_switch

     0.90%  mysqld   mysqld             [.] row_search_mvcc

     0.86%  mysqld   mysqld             [.] decimal2bin

     0.83%  mysqld   mysqld             [.] _mi_rec_check

….

Since this can be a special topic to dig in, we suggest you look into these really good external blogs as your references, perf Basics for MySQL Profiling, Finding MySQL Scaling Problems Using perf, or learn how to debug using valgrind with massif.

Efficient Way To Check MySQL Memory Utilization

Using ClusterControl relieves any hassle routines like going over through your runbooks or even creating your own playbooks that would deliver reports for you. In ClusterControl, you have Dashboards (using SCUMM) where you can have a quick overview of your MySQL node(s). For example, viewing the MySQL General dashboard,

you can determine how the MySQL node performs,

You see that the images above reveal variables that impact MySQL memory utilization. You can check how the metrics for sort caches, temporary tables, threads connected, query cache, or storage engines innodb buffer pool or MyISAM's key buffer.

Using ClusterControl offers you a one-stop utility tool where you can also check queries running to determine those processes (queries) that can impact high memory utilization. See below for an example,

Viewing the status variables of MySQL is quiet easy,

You can even go to Performance -> Innodb Status as well to reveal the current InnoDB status of your database nodes. Also, in ClusterControl, an incident is detected, it will try to collect incident and shows history as a report that provides you InnoDB status as shown in our previous blog about MySQL Freeze Frame.

Summary

Troubleshooting and diagnosing your MySQL database when suspecting high memory utilization isn't that difficult as long as you know the procedures and tools to use. Using the right tool offers you more flexibility and faster productivity to deliver fixes or solutions with a chance of greater result.

Tags:

MySQL

memory management

shared memory

monitoring

troubleshooting

↧

Influences leading to Asynchronous Programming Model in NDB Cluster

February 21, 2020, 7:34 am

≫ Next: Influences leading to NDB Cluster using a Shared Nothing Model

≪ Previous: What to Check if MySQL Memory Utilisation is High

A number of developments was especially important in influencing the development
of NDB Cluster. I was working at Ericsson, so when I didn't work on DBMS research
I was deeply involved in prototyping the next generation telecom switches. I was the
lead architect in a project that we called AXE VM. AXE was the cash cow of Ericsson
in those days. It used an in-house developed CPU called APZ. I was involved in some
considerations into how to develop a new generation of the next generation APZ in the
early 1990s. However I felt that the decided architecture didn't make use of modern
ideas on CPU development. This opened for the possibility to use a commercial CPU
to build a virtual machine for APZ. The next APZ project opted for a development
based on the ideas from AXE VM at the end of the 1990s. I did however at this time
focus my full attention to development of NDB Cluster.

One interesting thing about the AXE is that was the last single CPU telecom switch on
the market. The reason that the AXE architecture was so successful was due to the
concept of blocks and signals.

The idea with blocks came from inheriting ideas from HW development for SW
development. The idea is that each block is self-contained in that it contains all the
software and data for its operation. The only way to communicate between blocks is
through signals. More modern names on blocks and signals are modules and
messages. Thus AXE was entirely built on a message passing architecture.
However to make the blocks truly independent of each other it is important to only
communicate using asynchronous signals. As soon as synchronous signals are used
between blocks, these blocks are no longer independent of each other.

I became a very strong proponent of the AXE architecture, in my mind I saw that the
asynchronous model gave a 10x improvement of performance in a large distributed
system. The block and signal model constitutes a higher entrance fee to SW
development, but actually it provides large benefits when scaling the software for new
requirements.

One good example of this is when I worked on scaling MySQL towards higher
CPU core counts between 2008 and 2012. I worked on both improving scalability of
NDB Cluster and the MySQL Server. The block and signal model made it possible to
scale the NDB data nodes with an almost lock-free model. There are very few
bottlenecks in NDB data nodes for scaling to higher number of CPUs.
The main ones that still existed have been extensively improved in NDB Cluster 8.0.20.

Thus it is no big surprise that NDB Cluster was originally based on AXE VM. This
heritage gave us some very important parts that enabled quick bug fixing of
NDB Cluster. All the asynchronous messages goes through a job buffer. This means
that in a crash we can print the last few thousand messages that have been executed in
each thread in the crashed data node. In addition we also use a concept called
Jump Address Memory (jam). This is implemented in our code as macros that write
the line number and file number into memory such that we can track exactly how we
came to the crash point in the software.

So NDB Cluster comes from marrying the requirements on a network database for
3G networks with the AXE model that was developed in Ericsson in the 1970s.
As can be seen this model is still going strong given that NDB Cluster is able to deliver
the best performance, highest availability of any DBMS for telecom applications,
financial applications, key-value stores and even distributed file systems.

Thus listing the most important requirements we have on the software
engineering model:

1) Fail-fast architecture (implemented through ndbrequire macro in NDB)
2) Asynchronous programming (provides much tracing information in crashes)
3) Highly modular SW architecture
4) JAM macros to track SW in crash events

↧

Influences leading to NDB Cluster using a Shared Nothing Model

February 21, 2020, 7:35 am

≫ Next: Original NDB Cluster Requirements

≪ Previous: Influences leading to Asynchronous Programming Model in NDB Cluster

The requirements on Class 5 availability and immediate failover had two important
consequences for NDB Cluster. The first is that we wanted a fail-fast architecture.
Thus as soon as we have any kind of inconsistency in our internal data structures we
immediately fail and rely on the failover and recovery mechanisms to make the failure
almost unnoticable. The second is that we opted for a shared nothing model where all
replicas are able to take over immediately.

The shared disk model requires replay of the REDO log before failover is completed
and this can be made fast, but not immediate. In addition as one quickly understands
with the shared disk model is that it relies on an underlying shared nothing storage
service. The shared disk implementation can never be more available than the
underlying shared nothing storage service.

Thus it is actually possible to build a shared disk DBMS on top of NDB Cluster.

The most important research paper influencing the shared nothing model used in NDB
is the paper presented at VLDB 1992 called "Dynamic Data Distribution in a
Shared-Nothing Multiprocessor Data Store".

Obviously it was required to fully understand the ARIES model that was presented
also in 1992 by a team at IBM. However NDB Cluster actually choose a very different
model since we wanted to achieve a logical REDO log coupled with a checkpoint
model that actually changed a few times in NDB Cluster.

↧

Original NDB Cluster Requirements

February 21, 2020, 7:37 am

≫ Next: Requirements on NDB Cluster 8.0

≪ Previous: Influences leading to NDB Cluster using a Shared Nothing Model

NDB Cluster was originally developed for Network DataBases in the telecom
network. I worked in a EU project between 1991 and 1995 that focused on
developing a pre-standardisation effort on UMTS that later became standardised
under the term 3G. I worked in a part of the project where we focused on
simulating the network traffic in such a 3G network. I was focusing my attention
especially on the requirements that this created on a network database
in the telecom network.

In the same time period I also dived deeply into research literatures about DBMS
implementation.

The following requirements from the 3G studies emerged as the most important:

1) Class 5 Availability (less than 5 minutes of unavailability per year)
2) High Write Scalability as well as High Read Scalability
3) Predictable latency down to milliseconds
4) Efficient API
5) Failover in crash scenarios within seconds or even subseconds with a real-time OS

In another blog on the influences leading to the use of an asynchronous programming
model in NDB Cluster we derive the following requirements on the software
architecture.

1) Fail-fast architecture (implemented through ndbrequire macro in NDB)
2) Asynchronous programming (provides much tracing information in crashes)
3) Highly modular SW architecture
4) JAM macros to track SW in crash events

In another blog I present the influences leading to NDB Cluster using a shared
nothing model.

One important requirement that NDB Cluster is fairly unique in addressing is high
write scalability. Most DBMSs solves this by grouping together large amounts of
small transactions to make commits more efficient. This means that most DBMSs
have a very high cost of committing a transaction.

Modern replicated protocols actually have even made this worse. As an example in
most modern replicated protocols all transactions have to commit in a serial fashion.
This means that commit handling is a major bottleneck in many modern DBMSs.
Often this limits their transaction rates to tens of thousands commits per second.

NDB Cluster went another path and essentially commits every single row change
separate from any other row change. Thus the cost of executing 1000 transactions
with 1000 operations per transaction is exactly the same as the cost of executing
1 million single row transactions.

To achieve the grouping we used the fact that we are working in an asynchronous
environment. Thus we used several levels of piggybacking of messages. One of the
most important things here is that one socket is used to transport many thousands of
simultaneous database transactions. With NDB Cluster 8.0.20 we use multiple sockets
between data nodes and this scales another 10-20x to ensure that HW limitations is
the bottleneck and not the NDB software.

The asynchronous programming model ensures that we can handle thousands of
operations each millisecond and that changing from working on one transaction to
another is a matter of tens to hundreds of nanoseconds. In addition we can handle
these transactions independently in a number of different data nodes and even
within different threads within the same data node. Thus we can handle tens of millions
transactions per second even within a single data node.

The protocol we used for this is a variant of the two-phase commit protocol with
some optimisations based on the linear two-phase commit protocol. However the
requirements on Class 5 Availability meant that we had to solve the blocking part
of the two-phase commit protocol. We solved this by recreating the state of the
failed transaction coordinators in a surviving node as part of failover handling.
This meant that we will never be blocked by a failure as long as there is still a
sufficient amount of nodes to keep the cluster operational.

↧

Requirements on NDB Cluster 8.0

February 21, 2020, 7:38 am

≫ Next: Galera Replication flow Architecture

≪ Previous: Original NDB Cluster Requirements

In this blog I am going to go through the most important requirements that
NDB Cluster 8.0 is based on. I am going to also list a number of consequences
these requirements have on the product and what it supports.

One slideshare.net I uploaded a presentation of the NDB Cluster 8.0
requirements. In this blog and several accompanying I am going to present the
reasoning that these requirements led to in terms of software architecture, data
structures and so forth.

The requirements on NDB Cluster 8.0 is the following:

1) Unavailability of less than 30 seconds per year (Class 6 Availability)
2) Predictable latency
3) Transparent Distribution and Replication
4) Write and Read Scalability
5) Highest availability even with 2 replicas
6) Support SQL, LDAP, File System interface, ...
7) Mixed OLTP and OLAP for real-time data analysis
8) Follow HW development for CPUs, networks, disks and memories
9) Follow HW development in Cloud Setups

The original requirements of NDB Cluster was to only support Class 5 Availability.
Telecom providers have continued supporting even higher number of subscribers per
telecom database and thus driving the requirements to even be Class 6 Availability.
NDB Cluster have more than 15 years proven track record of handling Class 6
Availability.

The requirements on predictable latency means that we need to be able to handle
transactions involving around twenty operations within 10 milliseconds even when
the cluster is working at a high load.

To make sure that application development is easy we opted for a model where
distribution and replication is transparent from the application code. This means that
NDB Cluster is one of very few DBMSs that support auto-sharding requirements.

High Write Scalability has been a major requirement in NDB from day one.
NDB Cluster can handle tens of million transactions per second, most competing
DBMS products that are based on replication protocols can only handle
tens of thousands of transactions per second.

We used an arbitration model to avoid the requirements of 3 replicas, with
NDB Cluster 8.0 we fully support 3 and 4 replicas as well, but even with 2 replicas
we get the same availability as competing products based on replication protocols
require 3 replicas for.

The original requirements on NDB didn't include a SQL interface. An efficient
API was much more important for telecom applications. However when meeting
customers of a DBMS it was obvious that an SQL interface was needed.
So this requirement was added in the early 2000s. However most early users of
NDB Cluster still opted for a more direct API and this means that NDB Cluster
today have LDAP interfaces through OpenLDAP, file system interface through
HopsFS and a lot of products that use the NDB API (C++), ClusterJ (Java) and
an NDB NodeJS API.

The model of development for NDB makes it possible to also handle complex queries
in an efficient manner. Thus in development of NDB Cluster 8.0 we added the
requirement to better support also OLAP use cases of the OLTP data that is stored in
NDB Cluster. We have already made very significant improvements in this area by
supporting parallelised filters and to a great extent parallelisation of join processing
in the NDB Data Nodes. This is an active development area for the coming
generations of NDB Cluster.

NDB Cluster started its development in the 1990s. Already in this development we
could foresee some of the HW development that was going to happen. The product
has been able to scale as HW have been more and more scalable. Today this means that
each node in NDB Cluster can scale to 64 cores, data nodes can scale to 16 TB of
memory and at least 100 TB of disk data and can benefit greatly from higher and
higher bandwidth on the network.

Finally modern deployments often happen in cloud environments. Clouds are based
on an availability model with regions, availability domains and failure domains.
Thus NDB Cluster software needs to make it possible to make efficient use of
locality in the HW configurations.

↧

Galera Replication flow Architecture

February 21, 2020, 8:44 am

≫ Next: 3 Step Migration of MySQL data to Clickhouse for faster analytics.

≪ Previous: Requirements on NDB Cluster 8.0

Galera is the best solution for High Availability, It is being used by many peoples world wide . Galera is doing synchronous replication ( really it is Certification based replication ) to keep update the data on group nodes . In this blog I have explained about “How the Galera replication works?” . For the better understanding, I have made an architecture diagram to describe the replication flow . I have also provided the explanation for the key words which has used in the architecture diagram .

Architecture flow Diagram :

What is writeset ?

Writeset contains all changes made to the database by the transaction and append_key of the changed rows .

What is append_key ?

Append_key registers the key of the changed data by the transaction. The key for rows can be represented in three parts as DATABASE NAME, TABLE NAME, PRIMARY KEY .

If the table don’t have the PRIMARY KEY, the HASH of the modified data will be the part of the writeset .

What is Certification in Galera ?

Certification in Galera will be performed to detect the conflicts and the data consistency among the group . It will be performed before the transaction comiit .

What is CVV ( Central Certification Vector ) ?

CVV is used to detect the conflcits . The modified keys will added in to the Central Certification Vector. If the added key is already part of the vector, then conflict resolution checks are triggered.

Hope this blog will helps someone, who is working with Galera Cluster . I will be come up with my next blog soon .

Thanks !!!

↧

3 Step Migration of MySQL data to Clickhouse for faster analytics.

February 21, 2020, 9:50 am

≫ Next: Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

≪ Previous: Galera Replication flow Architecture

Recently one of our client approach Mydbops with Query slowness on a MySQL environment . They deployed the new code for generate the huge reports for the year end analytics data . After the deployment the queries were extremely slow and they struggled lot , then they approached us for the solution. After the analysis, their OLAP database as expected it was IO bound with 100% disk IOPS utilised during the report generation. So, the queries were starving for the Disk IO slows the process .

Problem statement :

Reports are majorly focused on two larger log tables ( emp_Report_model , emp_details ) .
The report generator (procedure) is using the count(*) statement to stimulate the aggregated data on each call. It is required for their business purpose .
Count(*) is terribly slow in MySQL ( Using MySQL 5.7 ) as it needs to count all the rows in the table . ( MySQL 8.0 has Innodb parallel read threads that can make count(*) faster )
MySQL INDEX can’t help as we are aggregating the complete data ( 90% of data on each call ) the queries will be a Full Table Scan (FTS).

Then on further analysis it is found it is only a INSERT workload on those tables. There is no UPDATE’s or DELETE’s on those tables .

we proposed a solution to overcome the problem with the help of Clickhouse and migrating the data to Clickhouse.

What is Clickhouse ?

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
Clickhouse Website

The major limitation on MySQL to Clickhouse replication is we can only apply the INSERT statements (append) from the MySQL. Clickhouse will not support for UPDATE’s and DELETE’s as a columnar database it makes sense.

Clickhouse Installation :

The installation is quite straight forward. The steps are available in Clickhouse official web site,

yum install rpm-build epel-release

curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash

yum install -y mysql-community-devel python34-devel python34-pip gcc python-devel libevent-devel gcc-c++ kernel-devel libxslt-devel libffi-devel openssl-devel python36 python36-devel python36-libs python36-tools

Clickhouse Server

yum install -y clickhouse-server clickhouse-client

Clickhouse MySQL replication Library

pip3 install clickhouse-mysql

Clickhouse startup :

[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server status
clickhouse-server service is stopped
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server start
Start clickhouse-server service: /etc/init.d/clickhouse-server: line 166: ulimit: open files: cannot modify limit: Operation not permitted
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
DONE
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server status
clickhouse-server service is running

[root@mydbopslabs192 ~]# clickhouse-client
ClickHouse client version 19.17.4.11.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.17.4 revision 54428.

mydbopslabs192  show databases;

SHOW DATABASES
┌─name────┐
│ default │
│ system │
└─────────┘
2 rows in set. Elapsed: 0.003 sec.

It is all set with installation , Next step i need to migrate the data from MySQL to Clickhouse and configure the replication for those tables .

Data Migration from MySQL to Clickhouse :

Step 1 ( Dump the Clickhouse based schema structure from MySQL ) :

First thing i need to migrate the MySQL tables structure to Clickhouse . MySQL and Clickhouse having different data types . So, we cannot apply the same structure from MySQL to Clickhouse . The below document providing the neat comparison between MySQL and Clickhouse data types .

https://shinguz.ch/blog/clickhouse-data-types-compared-to-mysql-data-types/

Let us convert table structure from MySQL to Clickhouse using the clickhouse-mysql tool.

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --create-table-sql-template --with-create-database --src-tables=data_Analytics.emp_Report_model,data_Analytics.emp_details > data_Reports_Jan21st.sql
2020-01-21 09:03:40,150/1579597420.150730:INFO:Starting
2020-01-21 09:03:40,150/1579597420.150977:DEBUG:{'app': {'binlog_position_file': None,
'config_file': '/etc/clickhouse-mysql/clickhouse-mysql.conf',
'create_table_json_template': False,
2020-01-21 09:03:40,223/1579597420.223511:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics
2020-01-21 09:03:40,264/1579597420.264610:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics

Dumping the table structure ,

[root@mydbopslabs192 ~]# less data_Reports_Jan12th.sql | grep CREATE
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_Report_model (

[root@mydbopslabs192 ~]# cat data_Reports_Jan12th.sql | head -n7
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (
WatchID Nullable(String),
JavaEnable Nullable(Int32),
Title Nullable(String),
GoodEvent Nullable(Int32),
EventTime Nullable(DateTime),

Step 2 ( Import the schema structure into Clickhouse ) :

[root@mydbopslabs192 ~]# clickhouse-client -mn < data_Reports_Jan12th.sql
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# fg
clickhouse-client

mydbopslabs192  use data_Analytics
USE data_Analytics
Ok.
0 rows in set. Elapsed: 0.001 sec.
mydbopslabs192  show tables;
SHOW TABLES
┌─name─────────────┐
│ emp_Report_model │
│ emp_details │
└──────────────────┘
2 rows in set. Elapsed: 0.003 sec.

Step 3 ( Migrating the data and keep replication sync ) :

Before configure the replication , the MySQL server should be configured with the below variables .

Mandatory MySQL settings :

server-id = <your id>
binlog_format = ROW
binlog_row_image = FULL

Now, we can configure the replication in two ways ,

Migrate the existing data , then configure the replication
Migrate the existing data and configure the replication in one step

i) Migrate the existing data , then configure the replication :

Commands to migrating the existing data

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --migrate-table --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --log-file=emp_Report_model.log

[root@mydbopslabs192 ~]# less emp_Report_model.log | grep -i migra
'migrate_table': True,
'table_migrator': {'clickhouse': {'connection_settings': {'host': '127.0.0.1',
2020-01-21 11:04:57,744/1579604697.744533:INFO:List for migration:
2020-01-21 11:04:57,744/1579604697.744947:INFO:Start migration data_Analytics.emp_Report_model
2020-01-21 11:04:57,891/1579604697.891935:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model

mydbopslabs192  select count(*) from data_Analytics.emp_Report_model;

┌─count()─┐
│ 8873898 │
└─────────┘

1 rows in set. Elapsed: 0.005 sec.

configuring the replication ,

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --log-file=emp_Report_model_Replication.log

2020-01-21 11:22:18,974/1579605738.974186:INFO:CSVWriter() self.path=/tmp/csvpool_1579605738.9738157_d643efe5-5ae0-47df-8504-40f61f2c139f.csv
2020-01-21 11:22:18,976/1579605738.976613:INFO:CHCSWriter() connection_settings={'port': 9000, 'host': '127.0.0.1', 'password': '', 'user': 'default'} dst_schema=data_Analytics dst_table=emp_Report_model
2020-01-21 11:22:18,976/1579605738.976936:INFO:starting clickhouse-client process
2020-01-21 11:22:19,160/1579605739.160906:INFO:['data_Analytics.emp_Report_model']
2020-01-21 11:22:19,166/1579605739.166096:INFO:['data_Analytics.emp_Report_model']
2020-01-21 11:22:19,170/1579605739.170744:INFO:['data_Analytics.emp_Report_model']
(END)

ii) Migrate the existing data and configure the replication in one step :

Here we need to define the flag –migrate-table with the replication command .

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --migrate-table --log-file=emp_Report_model_replication_mig.log

[root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i mig
2020-01-21 11:27:53,263/1579606073.263505:INFO:List for migration:
2020-01-21 11:27:53,263/1579606073.263786:INFO:Start migration data_Analytics.emp_Report_model
2020-01-21 11:27:53,316/1579606073.316788:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model

[root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i process
2020-01-21 11:28:01,071/1579606081.071054:INFO:starting clickhouse-client process

Validating count post inserting some records in MySQL (Source)

mydbopslabs192  select count(*) from data_Analytics.emp_Report_model;

┌─count()─┐
│ 8873900 │
└─────────┘

MySQL to Clickhouse replication is working as expected .

Performance comparison for OLAP workload ( MySQL vs Clickhouse ) :

Count(*) in MySQL :

mysql> select count(*) from emp_Report_model;

1 row in set (32.68 sec)

Count(*) in clickhouse :

mydbopslabs192  select count(*) from emp_Report_model;

1 rows in set. Elapsed: 0.007 sec.

Aggregated query in MySQL :

mysql> select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID;

292893 rows in set (1 min 2.61 sec)

Aggregated query in Clickhouse :

mydbopslabs192  select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID;

292893 rows in set. Elapsed: 1.710 sec. Processed 9.37 million rows, 906.15 MB (7.75 million rows/s., 749.15 MB/s.)

Yes, Clickhouse is performing very well with COUNT(*) and analytical queries .

Query Model	MySQL	Clickhouse
count(*)	33 seconds	0.1 seconds
OLAP Query	63 seconds	1.7 seconds

The above graph is just a pictorial representation of queries tested. Though Clickhouse excels in analytics workload it has it own limitations too. Now we have another happy customer at Mydbops who gets his analytics dashboard faster now.

Featured image credits Stephen Dawson on Unsplash

↧

Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

February 22, 2020, 12:21 pm

≫ Next: Choosing the best indexes for MySQL query optimization

≪ Previous: 3 Step Migration of MySQL data to Clickhouse for faster analytics.

I may get a chance to speak about proper bugs processing for open source projects later this year, so I have to keep reviewing recent MySQL bugs to be ready for that. In my previous post in this series I listed some interesting MySQL bug reports created in December, 2019. Time to move on to January, 2020! Belated Happy New Year of cool MySQL Bugs!

As usual I mostly care about InnoDB, replication and optimizer bugs and explicitly mention bug reporter by name and give link to his other active reports (if any). I also pick up examples of proper (or improper) reporter and Oracle engineers attitudes. Here is the list:

Bug #98103 - "unexpected behavior while logging an aborted query in the slow query log". Query that was killed while waiting for the table metadata lock is not only get logged, but also lock wait time is saved as query execution time. I'd like to highlight how bug reporter, Pranay Motupalli, used gdb to study what really happens in the code in this case. Perfect bug report!
Bug #98113 - "Crash possible when load & unload a connection handler". The (quite obvious) bug was verified based on code review, but only after some efforts were spent by Oracle engineer on denial to accept the problem and its importance. This bug was reported by Fangxin Flou.
Bug #98132 - "Analyze table leads to empty statistics during online rebuild DDL ". Nice addition to my collections! This bug with a nice and clear test case was reported by Albert Hu, who also suggested a fix.
Bug #98139 - "Committing a XA transaction causes a wrong sequence of events in binlog". This bug reported by Dehao Wang was verified as a "documentation" one, but I doubt documenting current behavior properly is an acceptable fix. Bug reporter suggested to commit in the binary log first, for example. Current implementation that allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed, is risky. A lot of arguing happened in comments in the process, and my comment asking for a clear quote from the manual:
Would you be so kind to share some text from this page you mentioned:

https://dev.mysql.com/doc/refman/8.0/en/xa.html

or any other fine MySQL 8 manual page stating that XA COMMIT is NOT supported when executed from session/connection/thread other than those prepared the XA transaction? I am doing something wrong probably, but I can not find such text anywhere.
was hidden. Let's see what happens to this bug report next.
Bug #98211 - "Auto increment value didn't reset correctly.". Not sure what this bug reported by Zhao Jianwei has to do with "Data Types", IMHO it's more about DDL or data dictionary. Again, some sarcastic comments from Community users were needed to put work on this bug back on track...
Bug #98220 - "with log_slow_extra=on Errno: info not getting updated correctly for error". This bug was reported by lalit Choudhary from Percona.
Bug #98227 - "innodb_stats_method='nulls_ignored' and persistent stats get wrong cardinalities". I think category is wrong for this bug. It's a but in InnoDB's persistent statistics implementation, one of many. The bug was reported by Agustín G from Percona.
Bug #98231 - "show index from a partition table gets a wrong cardinality value". Yet another by report by Albert Hu. that ended up as a "documentation" bug for now, even though older MySQL versions provided better cardinality estimations than MySQL 8.0 in this case (so this is a regression of a kind). I hope the bug will be re-classified and properly processed later.
Bug #98238 - "I_S.KEY_COLUMN_USAGE is very slow". I am surprised to see such a bug in MySQL 8. According to the bug reporter, Manuel Mausz, this is also a kind of regression comparing to older MySQL version, where these queries used to run faster. Surely, no "regression" tag in this case was added.
Bug #98284 - "Low sysbench score in the case of a large number of connections". This notable performance regression of MySQL 8 vs 5.7 was reported by zanye zjy. perf profiling pointed out towards ppoll() where a lot of time is spent. There is a fix suggested by Fangxin Flou (to use poll() instead), but the bug is still "Open".
Bug #98287 - "Explanation of hash joins is inconsistent across EXPLAIN formats". This bug was reported by Saverio M and ended up marked as a duplicate of Bug #97299 fixed in upcoming 8.0.20. Use EXPLAIN FORMAT=TREE in the meantime to see proper information about hash joins usage in the plan.
Bug #98288 - "xa commit crash lead mysql replication error". This bug report from Phoenix Zhang (who also suggested a patch) was declared a duplicate of Bug #76233 - "XA prepare is logged ahead of engine prepare" (that I've already discussed among other XA transactions bugs here).
Bug #98324 - "Deadlocks more frequent since version 5.7.26". Nice regression bug report by Przemyslaw Malkowski from Percona, with additional test provided later by Stephen Wei . Interestingly enough, test results shared by Umesh Shastry show that MySQL 8.0.19 is affected in the same way as 5.7.26+, but 8.0.19 is NOT listed as one of versions affected. This is a mistake to fix, along with missing regression tag.
Bug #98427 - "InnoDB FullText AUX Tables are broken in 8.0". Yet another regression in MySQL 8 was found by Satya Bodapati. Change in default collation for utf8mb4 character set caused this it seems. InnoDB FULLTEXT search was far from perfect anyway...

The are clouds in the sky of MySQL bugs processing.

To summarize:

Still too much time and efforts are sometimes spent on arguing with bug reporter instead of accepting and processing bugs properly. This is unfortunate.
Sometimes bugs are wrongly classified when verified (documentation vs code bug, wrong category, wrong severity, not all affected versions are listed, ignoring regression etc). This is also unfortunate.
Percona engineers still help to make MySQL better.
There are some fixes in upcoming MySQL 8.0.20 that I am waiting for :)
XA transactions in MySQL are badly broken (they are not atomic in storage engine + binary log) and hardly safe to use in reality.

↧

Choosing the best indexes for MySQL query optimization

January 31, 2020, 1:41 pm

≫ Next: PreFOSDEM talk: Upgrading from MySQL 5.7 to MySQL 8.0

≪ Previous: Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

Many of our users, developers and database administrators, keep asking our team about EverSQL's indexing recommendations algorithm. So, we decided to write about it.

This tutorial won't detail all the internals of the algorithm, but rather try to lay down the basic and important aspects of indexing, in simple terms.
Also, and most importantly, we'll present practical examples for properly indexing your tables and queries by relying on a set of rules, rather than on guessing.

Our focus in this tutorial is on MySQL, MariaDB and Percona Server databases. This information may be relevant for other database vendors as well, but in some cases may not.

Which indexes should I create for an SQL query?

As a general rule of thumb, MySQL can only use one index for each table in the query. Therefore, there is no point in creating more than one index for each query. Preferably, same indexes should match as many of the queries as possible, as it will reduce the load on the database when inserting or updating data (which requires updating the indexes as well).

When creating an index, the most important parts are the equality conditions in the WHERE and JOIN conditions. In most cases, conditions such as name = 'John' will allow the database to filter most of the rows from the table and go through a small amount of rows to return the required results. Therefore, we should start indexing by adding these columns to the index.

Then, you should look into the range conditions, but you should only add one of them - the most selective one, as MySQL can't handle more of them. In some cases when there are no range conditions, it makes sense to add the GROUP BY / ORDER BY columns, assuming the ordering is done in only one direction (ASC / DESC).

In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC). This doesn't guarantee that the database's optimizer will pick this index rather than the other compound indexes, but it's worth a try.

Also, in some cases, it makes sense to also add the columns from the SELECT clause to the index, to have a complete covering index. This only relevant if the index isn't already 'too large'. What's too large? Well, no official rule of thumb here, but let's say... 5-7 columns? Creating a covering index allows the database to not only filter using the index, but to also fetch the information required by the SELECT clause directly from the index, which saves precious I/O operations.

Let's look at an example to clarify:

SELECT 
    id, first_name, last_name, age
FROM
    employees
WHERE
    first_name = 'John'
        AND last_name = 'Brack'
        AND age > 25
ORDER BY age ASC;

For this query, we'll start with adding the columns first_name and last_name, which are compared with an equality operator. Then, we'll add the age column which is compared with a range condition. No need to have the ORDER BY clause indexed here, as the age column is already in the index. Last but not least, we'll add id from the SELECT clause to the index to have a covering index.

So to index this query properly, you should add the index:
employees (first_name, last_name, age, id).

The above is a very simplified pseudo-algorithm that will allow you to build simple indexes for rather simple SQL queries.

If you're looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL Query Optimizer which does all the heavy lifting for you.

What not to do when indexing (or writing SQL queries)?

We gathered some of the most common mistakes we see programmers and database administrators do when writing queries and indexing their tables.

Indexing each and every column in the table separately

In most cases, MySQL won't be able to use more than one index for each table in the query.

Therefore, when creating a separate index for each column in the table, the database is bound to perform only one of the search operations using an index, and the rest of them will be significantly slower, as the database can't use an index to execute them.

We recommend using compound indexes (explained later in this article) rather than single-column indexes.

The OR operator in filtering conditions

Consider this query:

SELECT 
    a, b
FROM
    tbl
WHERE
    a = 3 OR b = 8;

In many cases, MySQL won't be able to use an index to apply an OR condition, and as a result, this query is not index-able.

Therefore, we recommend to avoid such OR conditions and consider splitting the query to two parts, combined with a UNION DISTINCT (or even better, UNION ALL, in case you know there won't be any duplicate results)

The order of columns in an index is important

Let's say I hand you my contacts phone book which is ordered by the contact's first name and ask you to count how many people are there named "John" in the book. You'll grab the book in both hands and say "no problem". You will navigate to the page that holds all names starting with John, and start counting from there.

Now, let's say I change the assignment and hand you a phone book that is ordered by the contact's last name, but ask you to still count all contacts with the first name "John". How would you approach that? Well, the database scratches his head in this situation as well.

Now lets look at an SQL query to demonstrate the same behavior with the MySQL optimizer:

SELECT 
    first_name, last_name
FROM
    contacts
WHERE
    first_name = 'John';

Having the index contacts (first_name, last_name) is ideal here, because the index starts with our filtering condition and ends with another column in the SELECT clause.

But, having the reverse index contacts (last_name, first_name) is rather useless, as the database can't use the index for filtering, as the column we need is second in the index and not first.

The conclusion from this example is that the order of columns in an index is rather important.

Adding redundant indexes

Indexes are magnificent when trying to optimize your SQL queries and they can improve performance significantly.

But, they come with a downside as well. Each index you're creating should be kept updated and in sync when changes occur in your databases. So for each INSERT / UPDATE / DELETE in your databases, all relevant indexes should be updated. This update can take sometime, especially with large tables / indexes.

Therefore, do not create indexes unless you know you'll need them.

Also, we highly recommend to analyze your database once in a while, searching for any redundant indexes that can be removed.

How to automate index creation and SQL query optimization?

↧

PreFOSDEM talk: Upgrading from MySQL 5.7 to MySQL 8.0

February 22, 2020, 4:00 pm

≫ Next: Rethinking Result Sets in Connector/Node.js

≪ Previous: Choosing the best indexes for MySQL query optimization

In this post I’ll expand on the subject of my MySQL pre-FOSDEM talk: what dbadmins need to know and do, when upgrading from MySQL 5.7 to 8.0.

I’ve already published two posts on two specific issues; in this article, I’ll give the complete picture.

As usual, I’ll use this post to introduce tooling concepts that may be useful in generic system administration.

The presentation code is hosted on a GitHub repository (including the the source files and the output slides in PDF format), and on Slideshare.

Contents:

Summary of issues, and scope

The following are the basic issues to handle when migrating:

the new charset/collation utf8mb4/utf8mb4_0900_ai_ci;
the trailing whitespace is handled differently;
GROUP BY is not sorted anymore by default;
the information schema is now cached (by default);
incompatibility with schema migration tools.

Of course, the larger the scale, the more aspects will need to be considered; for example, large-scale write-bound systems may need to handle:

changes in dirty page cleaning parameters and design;
(new) data dictionary contention;
and so on.

In this article, I’ll only deal with what can be reasonably considered the lowest common denominator of all the migrations.

Requirements

All the SQL examples are executed on MySQL 8.0.

The new default character set/collation: `utf8mb4`/`utf8mb4_0900_ai_ci`

Summary

References:

MySQL introduces a new collation - utf8mb4_0900_ai_ci. Why?

Basically, it’s an improved version of the general_ci version - it supports Unicode 9.0, it irons out a few issues, and it’s faster.

The collation utf8(mb4)_general_ci wasn’t entirely correct; a typical example is Å:

-- Å = U+212B
SELECT "sÅverio" = "saverio" COLLATE utf8mb4_general_ci;
-- +--------+
-- | result |
-- +--------+
-- |      0 |
-- +--------+

SELECT "sÅverio" = "saverio"; -- Default (COLLATE utf8mb4_0900_ai_ci);
-- +--------+
-- | result |
-- +--------+
-- |      1 |
-- +--------+

From this, you can also guess what ai_ci means: accent insensitive/case insensitive.

So, what’s the problem?

Legacy.

Technically, utf8mb4 has been available in MySQL for a long time. At least a part of the industry started the migration long before, and publicly documented the process.

However, by that time, only utf8mb4_general_ci was available. Therefore, a vast amount of documentation around suggests to move to such collation.

While this is not an issue per se, is it a big issue when considering that the two collations are incompatible.

Tooling: MySQL RLIKE

For people who like (and frequently use) them, regular expressions are a fundamental tool.

In particular when performing administration tasks (using them in an application for data matching is a different topic), they can streamline some queries, avoiding lengthy concatenations of conditions.

In particular, I find it practical as a sophisticated SHOW <object> supplement.

SHOW <object>, in MySQL, supports LIKE, however, it’s fairly limited in functionality, for example:

SHOW GLOBAL VARIABLES LIKE 'character_set%'
-- +--------------------------+-------------------------------------------------------------------------+
-- | Variable_name            | Value                                                                   |
-- +--------------------------+-------------------------------------------------------------------------+
-- | character_set_client     | utf8mb4                                                                 |
-- | character_set_connection | utf8mb4                                                                 |
-- | character_set_database   | utf8mb4                                                                 |
-- | character_set_filesystem | binary                                                                  |
-- | character_set_results    | utf8mb4                                                                 |
-- | character_set_server     | utf8mb4                                                                 |
-- | character_set_system     | utf8                                                                    |
-- | character_sets_dir       | /home/saverio/local/mysql-8.0.19-linux-glibc2.12-x86_64/share/charsets/ |
-- +--------------------------+-------------------------------------------------------------------------+

Let’s turbocharge it!

Let’s get all the meaningful charset-related variables, but not one more, in a single swoop:

SHOW GLOBAL VARIABLES WHERE Variable_name RLIKE '^(character_set|collation)_' AND Variable_name NOT RLIKE 'system|data';
-- +--------------------------+--------------------+
-- | Variable_name            | Value              |
-- +--------------------------+--------------------+
-- | character_set_client     | utf8mb4            |
-- | character_set_connection | utf8mb4            |
-- | character_set_results    | utf8mb4            |
-- | character_set_server     | utf8mb4            |
-- | collation_connection     | utf8mb4_general_ci |
-- | collation_server         | utf8mb4_general_ci |
-- +--------------------------+--------------------+

Nice. The first regex reads: “string starting with (^) either character_set or collation”, and followed by _. Note that if we don’t group character_set and collation (via (…)), the ^ metacharacter applies only to the first.

How the charset parameters work

Character set and collation are a very big deal, because changing them in this case requires to literally (in a literal sense 😉) rebuild the entire database - all the records (and related indexes) including strings will need to be rebuilt.

In order to understand the concepts, let’s have a look at the MySQL server settings again; I’ll reorder and explain them.

Literals sent by the client are assumed to be in the following charset:

character_set_client (default: utf8mb4)

after, they’re converted and processed by the server, to:

character_set_connection (default: utf8mb4)
collation_connection (default: utf8mb4_0900_ai_ci)

The above settings are crucial, as literals are a foundation for exchanging data with the server. For example, when an ORM inserts data in a database, it creates an INSERT with a set of literals.

When the database system sends the results, it sends them in the following charset:

character_set_results (default: utf8mb4)

Literals are not the only foundation. Database objects are the other side of the coin. Base defaults for database objects (e.g. the databases) use:

character_set_server (default: utf8mb4)
collation_server (default: utf8mb4_0900_ai_ci)

String, and comparison, properties

Some developers would define a string as a stream of bytes; this is not entirely correct.

To be exact, a string is a stream of bytes associated to a character set.

Now, this concept applies to strings in isolation. How about operations on sets of strings, e.g. comparisons?

In a similar way, we need another concept: the “collation”.

A collation is a set of rules that defines how strings are sorted, which is required to perform comparisons.

In a database system, a collation is associated to objects and literal, both through system and specific defaults: a column, for example, will have its own collation, while a literal will use the default, if not specified.

But when comparing two strings with different collations, how is it decided which collation to use?

Enter the “Collation coercibility”.

Collation coercion, and issues `general` <> `0900_ai`

Reference: Collation Coercibility in Expressions

Coercibility is a property of collations, which defines the priority of collations in the context of a comparison.

MySQL has seven coercibility values:

0: An explicit COLLATE clause (not coercible at all) 1: The concatenation of two strings with different collations 2: The collation of a column or a stored routine parameter or local variable 3: A “system constant” (the string returned by functions such as USER() or VERSION()) 4: The collation of a literal 5: The collation of a numeric or temporal value 6: NULL or an expression that is derived from NULL

it’s not necessary to know them by heart, since their ordering makes sense, but it’s important to know how the main ones work in the context of a migration:

how columns will compare against literals;
how columns will compare against each other.

What we want to know is what happens in the workflow of a migration, in particular, if we:

start migrating the charset/collation defaults;
then, we slowly migrate the columns.

Comparisons utf8_general_ci column <> literals

Let’s create a table with all the related collations:

CREATE TABLE chartest (
  c3_gen CHAR(1) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci,
  c4_gen CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
  c4_900 CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci
);

INSERT INTO chartest VALUES('ä', 'ä', 'ä');

Note how we insert characters in the Basic Multilingual Plane) (BMP, essentially, the one supported by utf8mb3) - we’re simulating a database where we only changed the defaults, not the data.

Let’s compare with BMP utf8mb4:

SELECT c3_gen = 'ä' `result` FROM chartest;
-- +--------+
-- | result |
-- +--------+
-- |      1 |
-- +--------+

Nice; it works. Coercion values:

column: 2 # => wins
literal implicit: 4

More critical: we compare against a character in the Supplementary Multilingual Plane (SMP, essentially, one added by utf8mb4), with explicit collation:

SELECT c3_gen = '🍕' COLLATE utf8mb4_0900_ai_ci `result` FROM chartest;
-- +--------+
-- | result |
-- +--------+
-- |      0 |
-- +--------+

Coercion values:

column: 2
literal explicit: 0 # => wins

MySQL converts the first value and uses the explicit collation.

Most critical: compare against a character in the SMP, without implicit collation:

SELECT c3_gen = '🍕' `result` FROM chartest;
ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='

WAT!!

Weird?

Well, this is because:

column: 2 # => wins
literal implicit: 4

MySQL tries to coerce the charset/collation to the column’s one, and fails!

This gives a clear indication to the migration: do not allow SMP characters in the system, until the entire dataset has been migrated.

Comparisons utf8_general_ci column <> columns

Now, let’s see what happens between columns!

SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c3_gen = b.c4_gen;
-- +----------+
-- | COUNT(*) |
-- +----------+
-- |        1 |
-- +----------+

SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c3_gen = b.c4_900;
-- +----------+
-- | COUNT(*) |
-- +----------+
-- |        1 |
-- +----------+

SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c4_gen = b.c4_900;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

Ouch. BIG OUCH!

Why?

This is what happens to people who migrated, referring to obsolete documentation, to utf8mb4_general_ci - they can’t easily migrate to the new collation.

Summary of the migration path

The migration path outlined:

update the defaults to the new charset/collation;
don’t allow SMP characters in the application;
gradually convert the tables/columns;
now allow everything you want 😄.

is viable for production systems.

The new collation doesn’t pad anymore

There’s another unexpected property of the new collation.

Let’s simulate MySQL 5.7:

-- Not exact, but close enough
--
SELECT '' = _utf8' ' COLLATE utf8_general_ci;
-- +---------------------------------------+
-- | '' = _utf8' ' COLLATE utf8_general_ci |
-- +---------------------------------------+
-- |                                     1 |
-- +---------------------------------------+

How does this work on MySQL 8.0?:

-- Current (8.0):
--
SELECT '' = ' ';
-- +----------+
-- | '' = ' ' |
-- +----------+
-- |        0 |
-- +----------+

Ouch!

Where does this behavior come from? Let’s get some more info from the collations (with a regular expression, of course 😉):

SHOW COLLATION WHERE Collation RLIKE 'utf8mb4_general_ci|utf8mb4_0900_ai_ci';
-- +--------------------+---------+-----+---------+----------+---------+---------------+
-- | Collation          | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
-- +--------------------+---------+-----+---------+----------+---------+---------------+
-- | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
-- | utf8mb4_general_ci | utf8mb4 |  45 |         | Yes      |       1 | PAD SPACE     |
-- +--------------------+---------+-----+---------+----------+---------+---------------+

Hmmmm 🤔. Let’s have a look at the formal rules from the SQL (2003) standard (section 8.2):

3) The comparison of two character strings is determined as follows:

a) Let CS be the collation […]

b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a space.

In other words: the new collation does not pad.

This is not a big deal. Just, before migrating, trim the data, and make 100% sure that new instances are not introduced by the application before the migration is completed.

Triggers

Triggers are fairly easy to handle, as they can be dropped/rebuilt with the new settings - just make sure to consider comparisons inside the trigger body.

Sample of a trigger (edited):

SHOW CREATE TRIGGER enqueue_comments_update_instance_event\G

-- SQL Original Statement:
CREATE TRIGGER `enqueue_comments_update_instance_event`
AFTER UPDATE ON `comments`
FOR EACH ROW
trigger_body: BEGIN
  SET @changed_fields := NULL;

  IF NOT (OLD.description <=> NEW.description COLLATE utf8_bin AND CHAR_LENGTH(OLD.description) <=> CHAR_LENGTH(NEW.description)) THEN
    SET @changed_fields := CONCAT_WS(',', @changed_fields, 'description');
  END IF;

  IF @changed_fields IS NOT NULL THEN
    SET @old_values := NULL;
    SET @new_values := NULL;

    INSERT INTO instance_events(created_at, instance_type, instance_id, operation, changed_fields, old_values, new_values)
    VALUES(NOW(), 'Comment', NEW.id, 'UPDATE', @changed_fields, @old_values, @new_values);
  END IF;
END
--   character_set_client: utf8mb4
--   collation_connection: utf8mb4_0900_ai_ci
--     Database Collation: utf8mb4_0900_ai_ci

As you see, a trigger has associated charset/collation settings. This is because, differently from a statement, it’s not sent by a client, so it needs to keep its own settings.

In the trigger above, dropping/recreating in the context of a system with the new default works, however, it’s not enough - there’s a comparison in the body!

Conclusion: don’t forget to look inside the triggers. Or better, make sure you have a solid test suite 😉.

Sort-of-related suggestion

We’ve been long time users of MySQL triggers. They make a wonderful callback system.

When a system grows, it’s increasingly hard (tipping into the unmaintainable) to maintain application-level callbacks. Triggers will never miss any database update, and with a logic like the above, a queue processor can process the database changes.

Behavior with indexes

Now that we’ve examined the compatibility, let’s examine the performance aspect.

Indexes are still usable cross-charset, due to automatic conversion performed by MySQL. The point to be aware of is that the values are converted after being read from the index.

Let’s create test tables:

CREATE TABLE indextest3 (
  c3 CHAR(1) CHARACTER SET utf8,
  KEY (c3)
);

INSERT INTO indextest3 VALUES ('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'), ('i'), ('j'), ('k'), ('l'), ('m');

CREATE TABLE indextest4 (
  c4 CHAR(1) CHARACTER SET utf8mb4,
  KEY (c4)
);

INSERT INTO indextest4 SELECT * FROM indextest3;

Querying against a constant yields interesting results:

EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM indextest4 WHERE c4 = _utf8'n'\G
-- -> Aggregate: count(0)
--     -> Filter: (indextest4.c4 = 'n')  (cost=0.35 rows=1)
--         -> Index lookup on indextest4 using c4 (c4='n')  (cost=0.35 rows=1)

MySQL recognizes that n is a valid utf8mb4 character, and matches it directly.

Against a column with index:

EXPLAIN SELECT COUNT(*) FROM indextest3 JOIN indextest4 ON c3 = c4;
-- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
-- | id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
-- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
-- |  1 | SIMPLE      | indextest3 | NULL       | index | NULL          | c3   | 4       | NULL |   13 |   100.00 | Using index              |
-- |  1 | SIMPLE      | indextest4 | NULL       | ref   | c4            | c4   | 5       | func |    1 |   100.00 | Using where; Using index |
-- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+

EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM indextest3 JOIN indextest4 ON c3 = c4\G
--  -> Aggregate: count(0)
--     -> Nested loop inner join  (cost=6.10 rows=13)
--         -> Index scan on indextest3 using c3  (cost=1.55 rows=13)
--         -> Filter: (convert(indextest3.c3 using utf8mb4) = indextest4.c4)  (cost=0.26 rows=1)
--             -> Index lookup on indextest4 using c4 (c4=convert(indextest3.c3 using utf8mb4))  (cost=0.26 rows=1)

MySQL is using the index, so all good. However, what’s the func?

It simply tell us that the value used against the index is the result of a function. In this case, MySQL is converting the charset for us (convert(indextest3.c3 using utf8mb4)).

This is another crucial consideration for a migration - indexes will still be effective. Of course, (very) complex queries will need to be carefully examined, but there are the grounds for a smooth transition.

Consequences of the increase in (potential) size of char columns

Reference: The CHAR and VARCHAR Types

One concept to be aware of, although unlikely to hit real-world application, is that utf8mb4 characters will take up to 33% more.

In storage terms, databases need to know what’s the maximum limit of the data they handle. This means that even if a string will take the same space both in utf8mb3 and utf8mb4, MySQL needs to know what’s the maximum space it can take.

The InnoDB index limit is 3072 bytes in MySQL 8.0; generally speaking, this is large enough not to care.

Remember!:

[VAR]CHAR(n) refers to the number of characters; therefore, the maximum requirement is 4 * n bytes, but
TEXT fields refer to the number of bytes.

Information schema statistics caching

Reference: The INFORMATION_SCHEMA STATISTICS Table

Up to MySQL 5.7, information_schema statistics are updated real-time. In MySQL 8.0, statistics are cached, and updated only every 24 hours (by default).

In web applications, this affects only very specific use cases, but it’s important to know if one’s application is subject to this new behavior (our application was).

Let’s see the effects of this:

CREATE TABLE ainc (id INT AUTO_INCREMENT PRIMARY KEY);

-- On the first query, the statistics are generated.
--
SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc';
-- +------------+----------------+
-- | TABLE_NAME | AUTO_INCREMENT |
-- +------------+----------------+
-- | ainc       |           NULL |
-- +------------+----------------+

INSERT INTO ainc VALUES ();

SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc';
-- +------------+----------------+
-- | TABLE_NAME | AUTO_INCREMENT |
-- +------------+----------------+
-- | ainc       |           NULL |
-- +------------+----------------+

Ouch! The cached values are returned.

How about SHOW CREATE TABLE?

SHOW CREATE TABLE ainc\G
-- CREATE TABLE `ainc` (
--   `id` int NOT NULL AUTO_INCREMENT,
--   PRIMARY KEY (`id`)
-- ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

This command is always up to date.

How to update the statistics? By using ANALYZE TABLE:

ANALYZE TABLE ainc;

SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc';
-- +------------+----------------+
-- | TABLE_NAME | AUTO_INCREMENT |
-- +------------+----------------+
-- | ainc       |              2 |
-- +------------+----------------+

There you go. Let’s find out the related setting:

SHOW GLOBAL VARIABLES LIKE '%stat%exp%';
-- +---------------------------------+-------+
-- | Variable_name                   | Value |
-- +---------------------------------+-------+
-- | information_schema_stats_expiry | 86400 |
-- +---------------------------------+-------+

Developers who absolutely need to revert to the pre-8.0 behavior can set this value to 0.

GROUP BY not sorted anymore by default (+tooling)

Up to MySQL 5.7, GROUP BY’s result was sorted.

This was unnecessary - optimization-seeking developers used ORDER BY NULL in order to spare the sort, however, accidentally or not, some relied on it.

Those who relied on it are unfortunately required to scan the codebase. There isn’t a one-size-fits-all solution, and in this case, writing an automated solution may not be worth the time of manually inspecting the occurrences, however, this doesn’t prevent the Unix tools to help 😄

Let’s simulate a coding standard where ORDER BY is always on the line after GROUP BY, if present:

cat > /tmp/test_groupby_1 << SQL
  GROUP BY col1
  -- ends here

  GROUP BY col2
  ORDER BY col2

  GROUP BY col3
  -- ends here

  GROUP BY col4
SQL

cat > /tmp/test_groupby_2 << SQL

  GROUP BY col5
  ORDER BY col5
SQL

A basic version would be a simple grep scan with 1 line After each GROUP BY match:

$ grep -A 1 'GROUP BY' /tmp/test_groupby_*
/tmp/test_groupby_1:  GROUP BY col1
/tmp/test_groupby_1-  -- ends here
--
/tmp/test_groupby_1:  GROUP BY col2
/tmp/test_groupby_1-  ORDER BY col2
--
/tmp/test_groupby_1:  GROUP BY col3
/tmp/test_groupby_1-  -- ends here
--
/tmp/test_groupby_1:  GROUP BY col4
--
/tmp/test_groupby_2:  GROUP BY col5
/tmp/test_groupby_2-  ORDER BY col5

However, with some basic scripting, we can display only the GROUP BYs matching the criteria:

# First, we make Perl speak english: `-MEnglish`, which enables `$ARG` (among the other things).
#
# The logic is simple: we print the current line if the previous line matched /GROUP BY/, and the
# current doesn't match /ORDER BY/; after, we store the current line as `$previous`.
#
perl -MEnglish -ne 'print "$ARGV: $previous $ARG" if $previous =~ /GROUP BY/ && !/ORDER BY/; $previous = $ARG' /tmp/test_groupby_*

# As next step, we automatically open all the files matching the criteria, in an editor:
#
# - `-l`: adds the newline automatically;
# - `$ARGV`: is the filename (which we print instead of the match);
# - `unique`: if a file has more matches, the filename will be printed more than once - with
#    `unique`, we remove duplicates; this is optional though, as editors open each file(name) only
#    once;
# - `xargs`: send the filenames as parameters to the command (in this case, `code`, from Visual Studio
#    Code).
#
perl -MEnglish -lne 'print $ARGV if $previous =~ /GROUP BY/ && !/ORDER BY/; $previous = $ARG' /tmp/test_groupby_* | uniq | xargs code

There is another approach: an inverted regular expression match:

# Match lines with `GROUP BY`, followed by a line _not_ matching `ORDER BY`.
# Reference: https://stackoverflow.com/a/406408.
#
grep -zP 'GROUP BY .+\n((?!ORDER BY ).)*\n' /tmp/test_groupby_*

This is, however, freaky, and as regular expressions in general, has a high risk of hairpulling (of course, this is up to the developer’s judgement). It will be the subject of a future article, though, because I find it is a very interesting case.

Schema migration tools incompatibility

This is an easily missed problem! Some tools may not support MySQL 8.0.

There’s a known showstopper bug on the latest Gh-ost release, which prevents operations from succeeding on MySQL 8.0.

As a workaround, one case use trigger-based tools, like pt-online-schema-change v3.1.1 or v3.0.x (but v3.1.0 is broken!) or Facebook’s OnlineSchemaChange.

Obsolete Mac Homebrew default collation

When MySQL is installed via Homebrew (as of January 2020), the default collation is utf8mb4_general_ci.

There are a couple of solution to this problem.

Modify the formula, and recompile the binaries

A simple thing to do is to correct the Homebrew formula, and recompile the binaries.

For illustrative purposes, as part of this solution, I use the so-called “flip-flop” operator, which is something frowned upon… by people not using it 😉. As one can observe in fact, for the target use cases, it’s very convenient.

# Find out the formula location
#
$ mysql_formula_filename=$(brew formula mysql)

# Out of curiosity, let's print the relevant section.
#
# Flip-flop operator (`<condition> .. <condition>`): it matches *everything* between lines matching two conditions, in this case:
#
# - start: a line matching `/args = /`;
# - end: a line matching `/\]/` (a closing square bracket, which needs to be escaped, since it's a regex metacharacter).
#
$ perl -ne 'print if /args = / .. /\]/' "$(mysql_formula_filename)"
   args = %W[
     -DFORCE_INSOURCE_BUILD=1
     -DCOMPILATION_COMMENT=Homebrew
     -DDEFAULT_CHARSET=utf8mb4
     -DDEFAULT_COLLATION=utf8mb4_general_ci
     -DINSTALL_DOCDIR=share/doc/#{name}
     -DINSTALL_INCLUDEDIR=include/mysql
     -DINSTALL_INFODIR=share/info
     -DINSTALL_MANDIR=share/man
     -DINSTALL_MYSQLSHAREDIR=share/mysql
     -DINSTALL_PLUGINDIR=lib/plugin
     -DMYSQL_DATADIR=#{datadir}
     -DSYSCONFDIR=#{etc}
     -DWITH_BOOST=boost
     -DWITH_EDITLINE=system
     -DWITH_SSL=yes
     -DWITH_PROTOBUF=system
     -DWITH_UNIT_TESTS=OFF
     -DENABLED_LOCAL_INFILE=1
     -DWITH_INNODB_MEMCACHED=ON
   ]

# Fix it!
#
$ perl -i.bak -ne 'print unless /CHARSET|COLLATION/' "$(mysql_formula_filename)"

# Now recompile and install the formula
#
$ brew install --build-from-source mysql

Ignore the client encoding on handshake

An alternative solution is for the server to ignore the client encoding on handshake.

When configured this way, the server will impose on the clients the the default character set/collation.

In order to apply this solution, add character-set-client-handshake = OFF to the server configuration.

Good practice for (major/minor) upgrades: comparing the system variables

A very good practice when performing (major/minor) upgrades is to compare the system variables, in order to spot differences that may have an impact.

The MySQL Parameters website gives a visual overview of the differences between versions.

For example, the URL https://mysql-params.tmtms.net/mysqld/?vers=5.7.29,8.0.19&diff=true shows the differences between the system variables of v5.7.29 and v8.0.19.

Conclusion

The migration to MySQL 8.0 at Ticketsolve has been one of the smoothest, historically speaking.

This is a bit of a paradox, because we never had to rewrite our entire database for an upgrade, however, with sufficient knowledge of what to expect, we didn’t hit any significant bump (in particular, nothing unexpected in the optimizer department, which is usually critical).

Considering the main issues and their migration requirements:

the new charset/collation defaults are not mandatory, and the migration can be performed ahead of time and in stages;
the trailing whitespace just requires the data to be checked and cleaned;
the GROUP BY clauses can be inspected and updated ahead of time;
the information schema caching is regulated by a setting;
Gh-ost may be missed, but in worst case, there are valid comparable tools.

the conclusion is that the preparation work can be entirely done before the upgrade, and subsequently perform it with reasonable expectations of low risk.

Happy migration 😄

↧

Rethinking Result Sets in Connector/Node.js

February 24, 2020, 7:20 am

≫ Next: Use Cases for MySQL NDB Cluster 8.0

≪ Previous: PreFOSDEM talk: Upgrading from MySQL 5.7 to MySQL 8.0

It used to be the case where, in order to actually process data retrieved from the database using Connector/Node.js, you would have to resort to an API that required the use of both JavaScript callbacks and Promises. This was meant to provide more flexibility and control to the application developer and at the same time decrease the chance of buffering data unnecessarily. However this wasn’t useful for 99% of the use-cases and made simple tasks a little bit cumbersome. Also, the fact that it required using two different asynchronous constructs made it a little bit harder to grasp.

To make matters worse, in order to consume operational metadata about the columns in the result set, you would have to provide an additional callback, making the whole thing spiral a bit out of control, particularly when there were multiple result sets involved. In that case, you needed to create a shared context between the two functions in order to map data and metadata for each column in each result set.

Keep in mind that .execute() doesn’t return a promise, but rather receives a callback function to do your data processing of each individual row. This could be a bit annoying for you.
Gabriella Ferrara

Additionally, given the increasing pervasiveness of the MySQL Shell and the sheer number of examples and tutorials in the wild, some users found themselves shoehorning its synchronous JavaScript API in Node.js code which didn’t really work due to the asynchronous nature of the latter, leading to some confusion and a lot of annoyances.

This has changed with the release of Connector/Node.js 8.0.18 (in September 2019), wherein the Result instance, resolved by the Promise returned by the execute() method, includes a whole new set of utilities that allow you to consume the result set data in a way similar to a pull-based cursor. That cursor is implemented by the fetchOne() and fetchAll() methods, which are pretty much self-explanatory, but in essence, allow you to consume, from memory, either a single item or all the items from the result set. Calling these methods will consequently free the memory space allocated by the client code. This constitutes a good middle ground between a non-buffered approach like the one based on callbacks, or a fully-buffered approach where the result set is kept in memory unless the application explicitly clears it via an additional API.

There are now three different kinds of Results with a contextual interface that varies depending on whether you are fetching documents from a collection (DocResult), rows from a table (RowResult) or raw data via SQL (SqlResult). This is exactly how it is specified by the X DevAPI Result Set standard. The only difference is that, in Connector/Node.js, the execute() method is asynchronous, so you access the Result instance by handling the resulting Promise instead of it being directly returned when calling the method (like in other existing implementations).

In the same way, as described by the standard, for the Table.select() and Session.sql() APIs, besides the fetchOne() and fetchAll() methods, these interfaces are now providing additional methods such as getColumns() to process column metadata and nextResult() to iterate over multiple result sets.

So, looking back at the use cases analyzed before, this is how you can, at the moment, start working even more effectively with result sets using Connector/Node.js.

Document Store

With the same myCollection collection under the mySchema schema which contains the following:

[{
  "_id": "1",
  "name": "foo"
}, {
  "_id": "2",
  "name": "bar"
}]

You can now retrieve all documents in the collection with:

const collection = session.getSchema('mySchema').getCollection('myCollection');

const result = await collection.find().execute();
console.log(result.fetchAll()); // [{ _id: '1', name: 'foo' }, { _id: '2', name: 'bar' }]

// alternatively, fetching one document at a time
const result = await collection.find().execute();
const docs = [];

while (doc = result.fetchOne()) {
  docs.push(doc);
}

console.log(docs); // [{ _id: '1', name: 'foo' }, { _id: '2', name: 'bar' }]

Regular Tables

Working with tables means that besides column values, you can also access specific details about the column, such as its name, type, size, encoding, etc. Processing column metadata becomes a lot less confusing using the getColumns() method. So, with a myTable table under the same schema, which contains the following:

+-----+-------+
| _id | name  |
+-----+-------+
| "1" | "foo" |
| "2" | "bar" |
+-----+-------+

You can, similarly to the document mode counterpart, retrieve all the rows from the table with:

const table = session.getSchema('mySchema').getTable('myTable');

const result = await table.select().execute();
console.log(result.fetchAll()); // [['1', 'foo'], ['2', 'bar']]

// alternatively, fetching one row at a time
const result = await table.select().execute();
const rows = [];

while (row = result.fetchOne()) {
  rows.push(row);
}

console.log(rows); // [['1', 'foo'], ['2', 'bar']]

And you can retrieve details about each row in the table with:

const columns = result.getColumns();

const names = columns.map(c => c.getColumnName());
console.log(names); // ['_id', 'name']

const charsets = columns.map(c => c.getCharacterSetName());
console.log(charsets); // ['utf8mb4', 'utf8mb4']

const collations = columns.map(c => c.getCollationName());
console.log(collations); // ['utf8mb4_0900_ai_ci', 'utf8mb4_0900_ai_ci']

Creating an object mapping each column name to its value (similar to result set items in document mode) is now as easy as:

// the column "label" accounts for aliases
const mapping = res.fetchAll()
  .map(row => {
    return row.reduce((res, value, i) => {
      return Object.assign({}, res, { [columns[i].getColumnLabel()]: value })
    }, {});
  });

console.log(mapping); // [{ _id: '2', name: 'bar' }, { _id: '1', name: 'foo' }]

SQL

As already mentioned, one of the biggest advantages of this new API is that it also condenses the process for working with multiple result sets. So, with a table like the one used before and a PROCEDURE such as:

DELIMITER //
CREATE PROCEDURE proc()
BEGIN
  SELECT _id AS s1_c1, name AS s1_c2 FROM myTable;
  SELECT '3' as s2_c1, 'baz' AS s2_c2;
END//

You can easily iterate over all the result sets, without keeping any kind of shared state, like the following:

const rows = [];
const columns = [];

const res = await session.sql('CALL proc').execute();

do {
  columns.push(res.getColumns().map(c => c.getColumnLabel()));
  rows.push(res.fetchAll());
} while (res.nextResult() && res.hasData());

console.log(rows); // [[['1', 'foo'], ['2', 'bar']], [['3', 'baz']]]
console.log(columns); // [['s1_c1', 's1_c2'], ['s2_c1', 's2_c2']]

In Retrospect

This new result set API closes a huge gap with regards to X DevAPI platform compatibility and brings all the implementations closer to a point of becoming almost drop-in replacements for each other. In particular it capitalizes on the success of the MySQL Shell and introduces syntax way more similar to the one used by its JavaScript implementation, and provides a better framework for developers switching between the two environments. We believe it also leads to more readable and maintainable code while making a good compromise in terms of resource requirements, in particular with regards to memory usage.

Make sure you give it a try and let us know what you think about it. Report any issues you have via our bug tracker using the Connector for Node.js category or go one step further and submit a pull request.

If you want to learn more about Connector/Node.js and the X DevAPI, please check the following:

Make sure you also join our community Slack and come hang around at the #connectors channel:

MySQL Community on Slack

↧

Use Cases for MySQL NDB Cluster 8.0

February 25, 2020, 4:37 am

≫ Next: MySQL ERROR 1034: Incorrect Key File on InnoDB Table

≪ Previous: Rethinking Result Sets in Connector/Node.js

In this blog I will go through a number of popular applications that use
NDB Cluster 8.0 and also how these applications have developed over the
years.

There is a presentation at slideshare.net accompanying this blog.

The first major NDB development project was to build a prototype of a
number portability application together with a swedish telecom provider.
The aim of this prototype was to show how one could build advanced
telecom applications and manage it through standardised interfaces.
The challenge here was that the telecom applications have stringent
requirement, both on uptime and on latency. If the database access
took too long time the telecom applications would suffer from
abandoned calls and other problems. Obviously the uptime of the
database part had to be at least as high as the uptime of the
telecom switches. Actually even higher since many telecom databases
are used by many telecom switches.

In the prototype setup NDB Cluster was running in 2 SPARC computers
that was interconnected using a low latency SCI interconnect from
Dolphin, the SPARC computer was also connected to the AXE switch
through Ethernet that connected to the central computer in the
AXE switch through a regional processor. This demo was developed
in 1997 and 1998 and concluded with a successful demo.

In 1999 a new development project started up within a startup arm
of Ericsson. In 1999 the financial market was very hot and to have
instant access to stock quotes was seen as a major business benefit
(it still is).

We worked together with a swedish financial company and together with
them we developed an application that had two interfaces towards
NDB Cluster. One was the feed from the stock exchange where stock
order was fed into the database. This required low latency writes into
NDB Cluster and also very high update rates.

The second interface provided real-time stock quotes to users and
other financial applications. This version was a single-node
database service.

We delivered a prototype of this service that worked like a charm in
2001. At this point however the stock markets plunged and the financial
markets was no longer the right place for a first application for NDB
Cluster.

Thus we refocused the development of NDB Cluster back towards the
telecom market. This meant that we focused heavily on completing the work
on handling node failures of all sorts. We developed test programs that
ran thousands of node failures of all sorts every day. We worked with a
number of prospective customers in 2002 and 2003 and developed a number
of new versions of NDB Cluster.

The first customer that adopted NDB Cluster in a production environment
was Bredbandsbolaget, we worked together with them in 2003 and 2004 and
assisted them in developing their applications. Bredbandsbolaget was and
is an internet service provider. Thus the applications they used NDB
Cluster in was things like a DNS service, a DHCP service and so forth.

We worked close with them, we even had offices in the same house and on
the same floor, so we interacted on a daily basis. This meant that the
application and NDB Cluster was developed together and had a perfect
fit for each other. This application is still operational and have been
so since 2004. I even had Bredbandsbolaget as my own internet service
provider for 10 years. So I was not only developing NDB Cluster, I was
also one of its first users.

In 2003 NDB Cluster development was acquired by MySQL and we changed the
name to MySQL Cluster. Nowadays there are other clustering products within
the MySQL area, so to distinguish NDB Cluster I sometimes use
MySQL NDB Cluster and sometimes simply NDB Cluster. However the product
name is still MySQL Cluster.

After Bredbandsbolaget we got a number of new large customers in the telecom
area. Many of those telecom customers have used LDAP as the application
protocol to access their data since there was some standardisation in the
telecom sector around LDAP. To assist this there is a storage engine
to access NDB Cluster from OpenLDAP. One example of such a telecom
application is Juniper SBR Carrier System that has a combination of
SQL access, LDAP access, HTTP access, RADIUS acces towards NDB Cluster.
NDB is used in this application as a Session State database.

All sorts of telecom applications remains a very important use case for
NDB Cluster. One interesting area of development in the telecom space is
5G and IoT that will expand the application space for telecom substantially
and will expand also into self-driving cars, smart cities and many more
interesting applications that require ultra high availability coupled with
high write scalability and predictable low latency access.

Coming back to financial applications this remains an important use case
for NDB Cluster. High write scalability, ultra high availability and
predictable and low latency access to data is again the keywords that
drives the choice of NDB Cluster in this application area.

The financial markets also add one more dimension to the NDB use cases.
Given that NDB can handle large amounts of payment, payment checks,
white lists, black lists and so forth, it is also possible to use
the data in NDB Cluster for real-time analysis of the data.

Thus NDB Cluster 8.0 have focused significantly on delivering more
capabilities in the area of complex queries as well. We have seen
many substantial improvements in this area.

More and more of our users work with standard SQL interfaces towards
NDB Cluster and we worked very hard on ensuring that this provides
low latency access patterns. All the traditional interfaces towards
MySQL will also work with NDB Cluster. Thus NDB can be accessed from
all programming languages that one can use to access MySQL from.

However many financial applications are written in Java. From Java
we have a NDB API called ClusterJ. This API uses a Data Object Model
that makes it very easy to use. In many ways it can be easier to
work with ClusterJ compared to working with SQL in object-oriented
applications.

The next application category that recognized that NDB Cluster had a
very good fit for them was the Computer Gaming industry. There is a
number of applications within Computer Gaming where NDB Cluster has
a good fit. User profile management is one area where it is important
to always be up and running such that users can join and leave the
games at any time. Game state is another area that requires very high
write scalability. Most of these applications use the SQL interface
and many applications use fairly complex SQL queries and thus benefit
greatly from our improvements of parallel queries in NDB Cluster 8.0.

An interesting application that was developed at the SICS research
institute in Stockholm is HopsFS. This implements a file system in the
Hadoop world based on Hadoop HDFS. It scales to millions of
file operations per second.

This means that NDB Cluster 8.0 is already used in many important
AI applications as the platform for a distributed file system.

In NDB Cluster 8.0 we have improved such that write scalability is
even higher also when the writes are large in volume. NDB Cluster 8.0
scales to updates measured in GBytes per second even in a 2-node
cluster and in a larger cluster one can reach almost hundreds of
GBytes per second.

Thus NDB Cluster 8.0 is a very efficient tool to implement modern
key-value stores, distributed file systems and other highly
scalable applications.

NDB Cluster 8.0 is also a perfect tool to use in building many of the
modern applications that is the base of cloud applications. This is
one more active area of development for MySQL NDB Cluster.

Obviously all sorts of Web applications is also a good fit for NDB
Cluster. This is particularly true with developments in NDB Cluster
7.6 and 8.0 where we improved latency of simple queries and
implemented a shared memory transporter that makes it very
efficient to setup small clusters with low latency access to all data.

For web applications we also have a NodeJS API that can access
NDB Cluster directly without going through a MySQL Server.

In a keynote in 2015 GE showed some templates for how to setup
NDB Cluster in GE applications for the health-care industry. More on
architectures for NDB Cluster in a later blog.

↧

MySQL ERROR 1034: Incorrect Key File on InnoDB Table

February 25, 2020, 5:45 am

≫ Next: MySQL Partition over the Virtual / Generated Column

≪ Previous: Use Cases for MySQL NDB Cluster 8.0

MySQL ERROR 1034: Incorrect Key File on InnoDB Table

Sometimes, you may experience “ERROR 1034: Incorrect key file” while running the ALTER TABLE or CREATE INDEX command:

mysql> alter table ontime add key(FlightDate);
ERROR 1034 (HY000): Incorrect key file for table 'ontime'; try to repair it

As the error message mentions key file, it is reasonable to assume we’re dealing with the MyISAM storage engine (the legacy storage engine which used to have such a thing), but no, we can clearly see this table is InnoDB!

When the error message in MySQL is confusing or otherwise unhelpful, it is a good idea to check the MySQL error log:

2019-02-24T02:02:26.100600Z 9 [Warning] [MY-012637] [InnoDB] 1048576 bytes should have been written. Only 696320 bytes written. Retrying for the remaining bytes.
2019-02-24T02:02:26.100884Z 9 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2019-02-24T02:02:26.100894Z 9 [ERROR] [MY-012639] [InnoDB] Write to file (merge) failed at offset 625999872, 1048576 bytes should have been written, only 696320 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2019-02-24T02:02:26.100907Z 9 [ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'

The most important part of this message is “Error number 28 means ‘No space left on device’” – so, we’re simply running out of disk space. You may wonder, though, what file is it being written to and where is it located? “Write to file (merge) failed” is your (albeit, not fully helpful) indication; “merge” here corresponds to the temporary file which is used to perform a Merge Sort operation when building Indexes through Sort (AKA Innodb Fast Index Creation).

This file is created in the directory set by innodb_tmpdir server variable if it is not set by the setting of tmpdir variable or OS default, such as /tmp on Linux. In many cases, such a tmpdir may be located on a filesystem that has little space, making this error occur quite frequently.

The amount of disk space required can be significant, sometimes exceeding the total size of the final table. When adding indexes on CHAR/VARCHAR columns, especially with multibyte character sets (utf8, utf8mb3, utf8mb4), the space allocated for each index entry will be roughly a multiple of the number of bytes per character in charset to the maximum length of the string. So adding an index on utf8 VARCHAR(100) column will require roughly 400 bytes for every row in the table.

Summary:

Are you getting the “ERROR 1034: Incorrect key file” message for InnoDB table? Check your error log and the tmpdir server variable!

↧

MySQL Partition over the Virtual / Generated Column

February 25, 2020, 9:55 am

≫ Next: Connector/J DNS SRV with Consul Demo

≪ Previous: MySQL ERROR 1034: Incorrect Key File on InnoDB Table

Had an interesting requirement from one of our client to have the two MySQL partitions ( partition_full / partition_half ) for store the Names based on the user input.

Requirement :

The table have two columns first_name and last_name. With the user input of both columns, it needs to be automatically compute the data for another column full_name . And, the status needs be consider as “FULL” .
If the column last_name don’t have the input from the user, then the first_name data needs to be compute as the full_name . And, the status needs be considered as “HALF” .
Need the separate partitions for both status HALF and FULL

We can achieve this with the help of Virtual / Generated columns and LIST partition . In this blog, I am going to explain the complete steps which I followed to achieve this .

What is Virtual Column ?

Virtual columns are the generated columns because the data set for these columns will be computed based on the predefined column structure . Below are the three types we can generate the virtual columns .

STORED
VIRTUAL
GENERATED ALWAYS

Here is the detailed blog post from Mydbops Team, which contains the nice details about Virtual columns .

MySQL partition with Virtual / Generated columns

Step 1 – ( Creating the table with virtual columns )

cmd : create table Virtual_partition_test (id int(11) not null auto_increment primary key, first_name varchar(16), last_name varchar(16) default 0, full_name varchar(32) as (case last_name when ‘0’ then first_name else concat(first_name,’ ‘,last_name) end) stored, name_stat varchar(7) as (case full_name when concat(first_name,’ ‘,last_name) then ‘full’ else ‘half’ end) stored, email_id varchar(16));

full_name – for compute the data set from columns firstname and lastname .
name_stat – for compute the name status from columns firstname and lastname .

Step 2 – ( Testing the virtual/generated column behaviour )

cmd :

insert into Virtual_partition_test (first_name,last_name,email_id) values (‘sri’,’ram’,’sriram@gmail.com’),(‘hercules’,’7sakthi’,’hercules7sakthi@gmail.com’),(‘asha’,’mary’,’ashamary@gmail.com’);

insert into Virtual_partition_test (first_name,email_id) values (‘vijaya’,’vijaya@gmail.com’),(‘durai’,’durai@gmail.com’),(‘jc’,’jc@gmail.com’);

Yes, I have created 3 FULL and 3 HALF names .

The above result set, illustrates that the virtual/generated column is working perfectly as expected .

Step 3 – ( Adding the partition key )

It is important to have the partition column as the part of PRIMARY KEY .

cmd : alter table Virtual_partition_test drop primary key, add primary key (id,name_stat);

Step 4 – ( Configuring the partition )

cmd :

alter table Virtual_partition_test partition by list
columns(name_stat)
(partition partition_full values in (‘FULL’) engine=InnoDB,
partition partition_half values in (‘HALF’) engine=InnoDB);

Partitions has been added as per the requirement .

more informations,

Hope this blog will help someone who is looking the partitions over the virtual / generated columns .

Thanks !!!

↧

How to Deploy MySQL InnoDB Replica Set in Production?

What is prerequisite and limitation of using MySQL Replica Set?

In what kind of scenarios MySQL Replica Set is Recommended ?

How to configure and deploy MySQL Replica Set

Want to Know more?

How Master Key Rotation Works, Step by Step:

Overview

The Skinny

The Question

Recently, a customer asked us:

The Answers

Grep and Ye Shall Find

Singing a Better Tune

How to Change the Connector’s Load Balancer Algorithm

The Nitty-Gritty

Debug Logging to the Rescue!

The Library

Please read the docs!

Summary

The Wrap-Up

Memory Allocation in MySQL

What To Check Once MySQL Memory is High

Check Running Queries

Check for Swappiness

Using Perf, gdb, and Valgrind with Massif

Efficient Way To Check MySQL Memory Utilization

Summary

Which indexes should I create for an SQL query?

What not to do when indexing (or writing SQL queries)?

Indexing each and every column in the table separately

The OR operator in filtering conditions

The order of columns in an index is important

Adding redundant indexes

How to automate index creation and SQL query optimization?

Summary of issues, and scope

Requirements

The new default character set/collation: utf8mb4/utf8mb4_0900_ai_ci

Summary

Tooling: MySQL RLIKE

How the charset parameters work

String, and comparison, properties

Collation coercion, and issues general <> 0900_ai

Comparisons utf8_general_ci column <> literals

Comparisons utf8_general_ci column <> columns

Summary of the migration path

The new collation doesn’t pad anymore

Triggers

Sort-of-related suggestion

Behavior with indexes

Consequences of the increase in (potential) size of char columns

Information schema statistics caching

GROUP BY not sorted anymore by default (+tooling)

Schema migration tools incompatibility

Obsolete Mac Homebrew default collation

Modify the formula, and recompile the binaries

Ignore the client encoding on handshake

Good practice for (major/minor) upgrades: comparing the system variables

Conclusion

Document Store

Regular Tables

SQL

In Retrospect

Summary:

What is Virtual Column ?

MySQL partition with Virtual / Generated columns

Step 1 – ( Creating the table with virtual columns )

Step 2 – ( Testing the virtual/generated column behaviour )

Step 3 – ( Adding the partition key )

Step 4 – ( Configuring the partition )

The new default character set/collation: `utf8mb4`/`utf8mb4_0900_ai_ci`

Collation coercion, and issues `general` <> `0900_ai`