This is the second part of a two-part series blog for Maximizing Database Query Efficiency In MySQL. You can read part one here.

Using Single-Column, Composite, Prefix, and Covering Index

Tables that are frequently receiving high traffic must be properly indexed. It's not only important to index your table, but you also need to determine and analyze what are the types of queries or types of retrieval that you need for the specific table. It is strongly recommended that you analyze what type of queries or retrieval of data you need on a specific table before you decide what indexes are required for the table. Let's go over these types of indexes and how you can use them to maximize your query performance.

Single-Column Index

InnoD table can contain a maximum of 64 secondary indexes. A single-column index (or full-column index) is an index assigned only to a particular column. Creating an index to a particular column that contains distinct values is a good candidate. A good index must have a high cardinality and statistics so the optimizer can choose the right query plan. To view the distribution of indexes, you can check with SHOW INDEXES syntax just like below:

root[test]#> SHOW INDEXES FROM users_account\G

*************************** 1. row ***************************

        Table: users_account

   Non_unique: 0

     Key_name: PRIMARY

 Seq_in_index: 1

  Column_name: id

    Collation: A

  Cardinality: 131232

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

*************************** 2. row ***************************

        Table: users_account

   Non_unique: 1

     Key_name: name

 Seq_in_index: 1

  Column_name: last_name

    Collation: A

  Cardinality: 8995

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

*************************** 3. row ***************************

        Table: users_account

   Non_unique: 1

     Key_name: name

 Seq_in_index: 2

  Column_name: first_name

    Collation: A

  Cardinality: 131232

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

3 rows in set (0.00 sec)

You can inspect as well with tables information_schema.index_statistics or mysql.innodb_index_stats.

Compound (Composite) or Multi-Part Indexes

A compound index (commonly called a composite index) is a multi-part index composed of multiple columns. MySQL allows up to 16 columns bounded for a specific composite index. Exceeding the limit returns an error like below:

ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed

A composite index provides a boost to your queries, but it requires that you must have a pure understanding on how you are retrieving the data. For example, a table with a DDL of...

CREATE TABLE `user_account` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `last_name` char(30) NOT NULL,

  `first_name` char(30) NOT NULL,

  `dob` date DEFAULT NULL,

  `zip` varchar(10) DEFAULT NULL,

  `city` varchar(100) DEFAULT NULL,

  `state` varchar(100) DEFAULT NULL,

  `country` varchar(50) NOT NULL,

  `tel` varchar(16) DEFAULT NULL

  PRIMARY KEY (`id`),

  KEY `name` (`last_name`,`first_name`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

...which consists of composite index `name`. The composite index improves query performance once these keys are reference as used key parts. For example, see the following:

root[test]#> explain format=json select * from users_account where last_name='Namuag' and first_name='Maximus'\G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "1.20"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "60",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 1,

      "rows_produced_per_join": 1,

      "filtered": "100.00",

      "cost_info": {

        "read_cost": "1.00",

        "eval_cost": "0.20",

        "prefix_cost": "1.20",

        "data_read_per_join": "352"

      },

      "used_columns": [

        "id",

        "last_name",

        "first_name",

        "dob",

        "zip",

        "city",

        "state",

        "country",

        "tel"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec

The used_key_parts show that the query plan has perfectly selected our desired columns covered in our composite index.

Composite indexing has its limitations as well. Certain conditions in the query cannot take all columns part of the key.

The documentation says, "The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction…". Basically, this means that regardless you have composite index for two columns, a sample query below does not cover both fields:

root[test]#> explain format=json select * from users_account where last_name>='Zu' and first_name='Maximus'\G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "34.61"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "range",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name"

      ],

      "key_length": "60",

      "rows_examined_per_scan": 24,

      "rows_produced_per_join": 2,

      "filtered": "10.00",

      "index_condition": "((`test`.`users_account`.`first_name` = 'Maximus') and (`test`.`users_account`.`last_name` >= 'Zu'))",

      "cost_info": {

        "read_cost": "34.13",

        "eval_cost": "0.48",

        "prefix_cost": "34.61",

        "data_read_per_join": "844"

      },

      "used_columns": [

        "id",

        "last_name",

        "first_name",

        "dob",

        "zip",

        "city",

        "state",

        "country",

        "tel"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec)

In this case (and if your query is more of ranges instead of constant or reference types) then avoid using composite indexes. It just wastes your memory and buffer and it increases the performance degradation of your queries.

Prefix Indexes

Prefix indexes are indexes which contain columns referenced as an index, but only takes the starting length defined to that column, and that portion (or prefix data) are the only part stored in the buffer. Prefix indexes can help lessen your buffer pool resources and also your disk space as it does not need to take the full-length of the column.What does this mean? Let's take an example. Let's compare the impact between full-length index versus the prefix index.

root[test]#> create index name on users_account(last_name, first_name);

Query OK, 0 rows affected (0.42 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> \! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

36M     /var/lib/mysql/test/users_account.ibd

We created a full-length composite index which consumes a total of 36MiB tablespace for users_account table. Let's drop it and then add a prefix index.

root[test]#> drop index name on users_account;

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> alter table users_account engine=innodb;

Query OK, 0 rows affected (0.63 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> \! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

24M     /var/lib/mysql/test/users_account.ibd






root[test]#> create index name on users_account(last_name(5), first_name(5));

Query OK, 0 rows affected (0.42 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> \! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

28M     /var/lib/mysql/test/users_account.ibd

Using the prefix index, it holds up only to 28MiB and that's less than 8MiB than using full-length index. That's great to hear, but it doesn't mean that is performant and serves what you need.

If you decide to add a prefix index, you must identify first what type of query for data retrieval you need. Creating a prefix index helps you utilize more efficiency with the buffer pool and so it does help with your query performance but you also need to know its limitation. For example, let's compare the performance when using a full-length index and a prefix index.

Let's create a full-length index using a composite index,

root[test]#> create index name on users_account(last_name, first_name);

Query OK, 0 rows affected (0.45 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "1.61"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "60",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 3,

      "rows_produced_per_join": 3,

      "filtered": "100.00",

      "using_index": true,

      "cost_info": {

        "read_cost": "1.02",

        "eval_cost": "0.60",

        "prefix_cost": "1.62",

        "data_read_per_join": "1K"

      },

      "used_columns": [

        "last_name",

        "first_name"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec)



root[test]#> flush status;

Query OK, 0 rows affected (0.02 sec)



root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

PAGER set to 'cat -> /dev/null'

3 rows in set (0.00 sec)



root[test]#> nopager; show status like 'Handler_read%';

PAGER set to stdout

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| Handler_read_first    | 0 |

| Handler_read_key      | 1 |

| Handler_read_last     | 0 |

| Handler_read_next     | 3 |

| Handler_read_prev     | 0 |

| Handler_read_rnd      | 0 |

| Handler_read_rnd_next | 0     |

+-----------------------+-------+

7 rows in set (0.00 sec)

The result reveals that it's, in fact, using a covering index i.e "using_index": true and uses indexes properly, i.e. Handler_read_key is incremented and does an index scan as Handler_read_next is incremented.

Now, let's try using prefix index of the same approach,

root[test]#> create index name on users_account(last_name(5), first_name(5));

Query OK, 0 rows affected (0.22 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "3.60"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "10",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 3,

      "rows_produced_per_join": 3,

      "filtered": "100.00",

      "cost_info": {

        "read_cost": "3.00",

        "eval_cost": "0.60",

        "prefix_cost": "3.60",

        "data_read_per_join": "1K"

      },

      "used_columns": [

        "last_name",

        "first_name"

      ],

      "attached_condition": "((`test`.`users_account`.`first_name` = 'Maximus Aleksandre') and (`test`.`users_account`.`last_name` = 'Namuag'))"

    }

  }

}

1 row in set, 1 warning (0.00 sec)



root[test]#> flush status;

Query OK, 0 rows affected (0.01 sec)



root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

PAGER set to 'cat -> /dev/null'

3 rows in set (0.00 sec)



root[test]#> nopager; show status like 'Handler_read%';

PAGER set to stdout

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| Handler_read_first    | 0 |

| Handler_read_key      | 1 |

| Handler_read_last     | 0 |

| Handler_read_next     | 3 |

| Handler_read_prev     | 0 |

| Handler_read_rnd      | 0 |

| Handler_read_rnd_next | 0     |

+-----------------------+-------+

7 rows in set (0.00 sec)

MySQL reveals that it does use index properly but noticeably, there's a cost overhead compared to a full-length index. That's obvious and explainable, since the prefix index does not cover the whole length of the field values. Using a prefix index is not a replacement, nor an alternative, of full-length indexing. It can also create poor results when using the prefix index inappropriately. So you need to determine what type of query and data you need to retrieve.

Covering Indexes

Covering Indexes doesn't require any special syntax in MySQL. A covering index in InnoDB refers to the case when all fields selected in a query are covered by an index. It does not need to do a sequential read over the disk to read the data in the table but only use the data in the index, significantly speeding up the query. For example, our query earlier i.e.

select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

As mentioned earlier, is a covering index. When you have a very well-planned tables upon storing your data and created index properly, try to make as possible that your queries are designed to leverage covering index so that you'll benefit the result. This can help you maximize the efficiency of your queries and result to a great performance.

Leverage Tools That Offer Advisors or Query Performance Monitoring

Organizations often initially tend to go first on github and find open-source software that can offer great benefits. For simple advisories that helps you optimize your queries, you can leverage the Percona Toolkit. For a MySQL DBA, the Percona Toolkit is like a swiss army knife.

For operations, you need to analyze how you are using your indexes, you can use pt-index-usage.

Pt-query-digest is also available and it can analyze MySQL queries from logs, processlist, and tcpdump. In fact, the most important tool that you have to use for analyzing and inspecting bad queries is pt-query-digest. Use this tool to aggregate similar queries together and report on those that consume the most execution time.

For archiving old records, you can use pt-archiver. Inspecting your database for duplicate indexes, take leverage on pt-duplicate-key-checker. You might also take advantage of pt-deadlock-logger. Although deadlocks is not a cause of an underperforming and inefficient query but a poor implementation, yet it impacts query inefficiency. If you need table maintenance and requires you to add indexes online without affecting the database traffic going to a particular table, then you can use pt-online-schema-change. Alternatively, you can use gh-ost, which is also very useful for schema migrations.

If you are looking for enterprise features, bundled with lots of features from query performance and monitoring, alarms and alerts, dashboards or metrics that helps you optimize your queries, and advisors, ClusterControl may be the tool for you. ClusterControl offers many features that show you Top Queries, Running Queries, and Query Outliers. Checkout this blog MySQL Query Performance Tuning which guides you how to be on par for monitoring your queries with ClusterControl.

Conclusion

As you've arrived at the ending part of our two-series blog. We covered here the factors that cause query degradation and how to resolve it in order to maximize your database queries. We also shared some tools that can benefit you and help solve your problems.

Tags:

query monitoring

query tuning

MySQL

database performance

performance monitoring

TL; DR

SELECT … FOR UPDATE has a (not so) surprising side effect on non-existent rows: it could cause a (serious) performance penalty and even prevent you from inserting new rows at all.

Locking rows for update

A development team of ours was working on an application that needed to ensure an update on a single row item isn’t modified by another transaction. Naturally they started making use of SELECT … FOR UPDATE to lock the row before updating it. This worked excellent to keep anyone else from updating this row. However they started to get some lock wait timeouts on new inserts of totally unrelated items during a load test and they asked me to look into this.

SELECT … FOR UPDATE is described as following in the MySQL documentation:
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
So far so good: this behavior is expected to happen. It also doesn’t mention anything about locking anything but the rows it reads.

I asked the team whether they were attempting to insert the same data as they were locking in the other transaction and they said they were not.

In pseudo code they were doing this:
SELECT ... WHERE uuid='some-uuid' FOR UPDATE; if row { UPDATE row } else { INSERT row }
The uuid column here is the primary key of the table. This code executed fine and had no issues by itself as a single transaction.

You may wonder why not use the INSERT … ON DUPLICATE KEY UPDATE or REPLACE INTO?
First of all we are inserting only occasionally, so that would fail the insert command 99% of the time. Second of all we may only be updating a single column within a single row, so that implies we would need to know the entire row up front when we have to insert or replace the row.

No row to update

Now what would happen if there is no row to update?

According to the description in the MySQL documentation it sets an exclusive lock on each row it reads, but what about when there is no row to read? This other excerpt on gap locks might hint what it actually does do:
For example, SELECT c1 FROM t WHERE c1 BETWEEN 10 and 20 FOR UPDATE; prevents other transactions from inserting a value of 15 into column t.c1, whether or not there was already any such value in the column, because the gaps between all existing values in the range are locked.
If there is no row, a lock will still be set on the associated index entries. This shouldn’t be bad, right? We can still insert other rows, right? Wrong: it isn’t a gap lock alone, but a next-key lock!

Since the lock is set on a non-existent index entry, a next-key lock is created. Depending on where you would insert in the index, you may find a whole range being locked as it needs to insert within this range. In our version of UUID this shouldn’t happen very often as there is a random factor, but it still can happen. As cross region latency is present on this system, this keeps the next-key locks open longer and the chance of a collision in gap increases also a little bit. So that explains the behavior during the load test. So all’s well, ends well?

Nasty side effect of Next-Key locks

There is one nasty side effect with the next-key lock: if the index value would be greater than the largest value in the table it locks everything above the largest value until infinity.

So what would happen to a table where the primary key is sequential like an integer? For example this table with the following rows:
CREATE TABLE sometable ( id int(11) NOT NULL, some_data varchar(255) NOT NULL default '', PRIMARY KEY (some_id) ); INSERT INTO sometable VALUES (1, 'test'), (2, 'test'), (3, 'test'), (4, 'test'), (5, 'test'), (10, 'test'), (11, 'test'), (12, 'test'), (13, 'test'), (14, 'test'), (15, 'test');
This would create a gap between 5 and 10 and a gap from 15 till infinity.

When we are selecting within the gap between 5 and 10, we create a next-key lock between 5 and 10 and we can’t insert new rows inside this gap. We can still insert new rows at the end of the table though. However if we would select a row on id greater than 15 we would put a next-key lock on 15 till infinity. This means nobody can append anything to this table anymore until we have committed our transaction! This could become a serious bottleneck if you insert more rows than update.

Conclusion

Even though SELECT … FOR UPDATE sounds like a great way to ensure your transaction is the only one who modifies a specific row, it’s quite dangerous as it could lock out or delay other transactions. If you would take our example above, the safe way to do it (in pseudo code) is:
SELECT ... WHERE uuid='some-uuid'; if row { SELECT ... WHERE uuid='some-uuid' FOR UPDATE; UPDATE row } else { INSERT row }
This would ensure the lock would only be set when there actually is a row present, however this is at the expense of an additional query.

Q: What is the best readahead size?
A: O_DIRECT

Perhaps I agree with Dr. Stonebraker. This is my answer which might not be the correct answer. My reasons for O_DIRECT are performance, quality of service (QoS) and manageability and performance might get too much attention. I don't dislike Linux but the VM, buffered IO, readahead and page cache are there for all Linux use cases. They must be general purpose. Complex system software like a DBMS isn't general purpose and can do its own thing when needed. Also, I appreciate that kernel developers have done a lot to make Linux better for a DBMS. One of the perks at FB was easy access to many kernel developers.

Most of my web-scale MySQL/InnoDB experience is with O_DIRECT. While InnoDB can use buffered IO we always chose O_DIRECT. Eventually, RocksDB arrived and it only did buffered IO for a few years. Then O_DIRECT support was added and perhaps one day the web-scale MyRocks team will explain what they use.

I deal with readahead when running benchmarks and a common experience is using the wrong (too large) value and then repeating tests which means I spend more time and more SSD endurance thanks to buffered IO. I have many blog posts with performance results for readahead including at least one for MongoDB. Usually my goal was to find which small value is good enough. I learned that 0 is too small. Readahead can help scan-heavy workloads, but my focus is on OLTP where we avoided most scans except for logical backup.

I understand why buffered IO is used by some DBMS. Early in the product lifecycle it can be a placeholder until more features are added to make O_DIRECT performant. The benefits of the OS page cache include:

Filesystem readahead can be used before the DBMS adds support for prefetching when doing scans. But filesystem readahead is a black box, might differ between filesystems, provides no metrics and will do the wrong thing for some workloads. InnoDB provides a prefetch feature which can help when O_DIRECT is used. I disabled it because OLTP. The Facebook MySQL team (thanks Nizam) added logical readahead to make logical backup faster and more efficient. Filesystem readahead is likely to struggle with index structure fragmentation, so it is best suited for heap-organized tables and will suffer with index scans.
Doing writes to the OS page cache followed by fsync can be used before the DBMS adds support for async IO or background write threads. But Postgres suffered for so long from this approach because calling fsync with an unknown amount of dirty pages in the OS page cache can starve all other pending IO requests for many seconds. The situation is less dire today thanks to work by Jens Axboe to make writeback less annoying. There was much discussion in 2014 at a summit that included Postgres and Linux kernel developers. In addition to Linux improvements, features have been added to Postgres to reduce the impact from writeback storms -- read this to learn about spread checkpoints.
For a DBMS that does compression it is easier to use the DBMS cache for uncompressed pages and the OS page cache for compressed pages. I am familiar with amazing work in InnoDB to manage both in the DBMS cache. We all agree the code is very complex. RocksDB also has an option to cache both in its block cache but I have little experience with the feature. It is hard to figure out the best way to divide the DBMS cache between compressed and uncompressed pages.

Performance advantages for O_DIRECT include:

Does one memory copy to move data from storage to the DBMS while buffered needs two
Avoids CPU and mutex contention overhead in the OS page cache
Avoids wasting memory from double buffering between the DBMS cache and OS page cache

QoS advantages for O_DIRECT include:

Filesystem readahead is frequently wrong and either wastes IO or favors the wrong user leading to worse IO response times for other users
OS page cache will get trashed by other services sharing the host
Writeback storms starve other IO requests. Writeback is usually a background task and can tolerate response time variance. Too much writeback makes user reads and log fsync slower and those operations don't want response time variance.
Reduces stalls - this is a placeholder because my local expert has yet to explain this in public. But you will have a better time with Linux when moving less data through the OS page cache, especially with modern storage devices that can sustain many GB/sec of throughput. And when you aren't having a good time then you can fix the DBMS. The DBMS is involved whether or not it relies on the OS page cache so you always have to make it work.

Manageability advantages for O_DIRECT include:

DBMS prefetch and writeback are documented, tunable and provide metrics. Such claims are less true for filesystem readahead and VM writeback. There is a lot of advice on the web and much disagreement especially on the topic of min_free_kbytes. Domas used to be my source on this but he doesn't blog enough about the Linux VM.

Contention in MySQL InnoDB In a high concurrency world, where more and more users->connections->threads are used, contention is a given. But how do we identify the contention point easily?

Different approaches had been discussed previously, like the one using Perf and Flame graphs to track down the function taking way more time than expected. That method is great but how can we do it with what one normally has, like the MySQL Client? Enter: the SEMAPHORES section from the SHOW ENGINE INNODB STATUS command output.

SEMAPHORES

The SEMAPHORES section displays all the metrics related to InnoDB mechanics on waits. This section is your best friend if you have a high concurrency workload. In short, it contains 2 kinds of data: Event counters and a list of current waits.

Current Waits

That is a section that should be empty unless your MySQL has a high concurrency that causes InnoDB to start using the waiting mechanism. If you don’t see lines with the form “”– Thread <num> was waited…” then you are good. No contention.

Now, what does it look like? It could be like:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1744351
--Thread 139964395677440 has waited at btr0cur.cc line 5889 for 0 seconds the semaphore:
S-lock on RW-latch at 0x7f4c3d73c150 created in file buf0buf.cc line 1433
a writer (thread id 139964175062784) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0sea.cc line 1121
Last time write locked in file /mnt/workspace/percona-server-5.7-redhat-binary-rocks-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.7.28-31/percona-server-5.7.28-31/storage/innobase/btr/btr0sea.cc line 1121
OS WAIT ARRAY INFO: signal count 1483499
RW-shared spins 0, rounds 314940, OS waits 77827
RW-excl spins 0, rounds 205078, OS waits 7540
RW-sx spins 4357, rounds 47820, OS waits 949
Spin rounds per wait: 314940.00 RW-shared, 205078.00 RW-excl, 10.98 RW-sx

or like

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1744302
--Thread 139964401002240 has waited at row0ins.cc line 2520 for 0  seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7f4c3d956890 created in file buf0buf.cc line 1433
a writer (thread id 139964401002240) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 3869
Last time write locked in file /mnt/workspace/percona-server-5.7-redhat-binary-rocks-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.7.28-31/percona-server-5.7.28-31/storage/innobase/row/row0upd.cc line 2881
OS WAIT ARRAY INFO: signal count 1483459
RW-shared spins 0, rounds 314905, OS waits 77813
RW-excl spins 0, rounds 204982, OS waits 7532
RW-sx spins 4357, rounds 47820, OS waits 949
Spin rounds per wait: 314905.00 RW-shared, 204982.00 RW-excl, 10.98 RW-sx

Or incredibly long (not showing here).

The way that particular section was monitored is through the execution of an infinite loop:

while true; do mysql -N -e"show engine innodb status\G" | sed -n '/SEMAPHORES/,/TRANSACTIONS/p'; sleep 1; done

But what should you do with that info?

Looking for Info

From the current waits, what we need is the following:

The exact version of MySQL (and flavor: Percona Server, Oracle’s MySQL, MariaDB)
Filename
File line

Let’s use the first example, which is a server that experienced high concurrency for a while as seen in Percona Monitoring and Management:

One can say: “but there’s only a peak of 42 threads and the majority of the throughput distribution is on the low concurrency side!” This is a 2 core VM with small physical memory and thus a pretty small buffer pool. An average of 22 for threads running is high concurrency.

Now, from the SEMAPHORES output of the first example we have:

--Thread 139964395677440 has waited at btr0cur.cc line 5889 for 0 seconds the semaphore: S-lock on RW-latch at 0x7f4c3d73c150 created in file buf0buf.cc line 1433 a writer (thread id 139964175062784) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0sea.cc line 1121 Last time write locked in file /mnt/workspace/percona-server-5.7-redhat-binary-rocks-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.7.28-31/percona-server-5.7.28-31/storage/innobase/btr/btr0sea.cc line 1121

What’s the MySQL version? The last line tells us: Percona Server 5.7.28-31

What’s the file and line? btr0cur.cc line 5889 has waited for an S-Lock on an RW-latch created in buf0buf.cc line 1433 but another thread has reserver in mode exclusive in file btr0sea.cc line 1121.

Ok, we have directions. Now let’s see what’s inside.

Looking Inside the Code

Do I need to download the source code for inspection? No, you don’t! What you need is to navigate to the code repository, which in this case is a GitHub repo.

And here is where the exact version comes handy. In order to guarantee that we are reading the exact line code, we better make sure we are reading the exact version.

Finding the Repository

GitHub is pretty easy to navigate and in this case, what we are looking for is Release. 5.7.28-31 to be precise. The Percona Server repo URL is: https://github.com/percona/percona-server/.

Once you are there, we need to find the release.

Finding the Release

The release can be found in the link showed in the below graph:

Inside one can see the releases:

Click the link and it will take you to the tag page:

And finally, click the link shown above and you will be at the repo of the release needed: https://github.com/percona/percona-server/tree/Percona-Server-5.7.28-31

What’s next? Reading the actual content of the files.

Navigating the Code Tree

The relevant part of the code tree is:
-root ---storage ------innobase

The InnoDB storage engine code is inside the “innobase” directory. Inside that directory, there’s a bunch of other directories. But how do you choose the correct one? Well, the filename has the answer. The files we need to look are:

btr0cur.cc
btr0sea.cc
buf0buf.cc

All files have the same syntax: xxx0xxx.xx and the directory name is the part before the zero. In our case, we need to look inside two directories: btr and buf. Once inside the directories, finding the files is an easy task.

These are our files in the mentioned line numbers:

https://github.com/percona/percona-server/blob/Percona-Server-5.7.28-31/storage/innobase/btr/btr0sea.cc#L1121
https://github.com/percona/percona-server/blob/Percona-Server-5.7.28-31/storage/innobase/buf/buf0buf.cc#L1433
https://github.com/percona/percona-server/blob/Percona-Server-5.7.28-31/storage/innobase/btr/btr0cur.cc#L5889

The btr0sea.cc file as described in the head of the file is “The index tree adaptive search” a.k.a: the Adaptive Hash Index (AHI)
The buf0buf.cc as described in the file is “The database buffer buf_pool” a.k.a: the InnoDB buffer Pool
The btr0cur.cc as described in the file is “The index tree cursor” a.k.a: the actual B-Tree or where the data exists in InnoDB.

What was Going on Then?

At btr0cur.cc line 5889 InnoDB is inside a function called btr_estimate_n_rows_in_range_low, and the description is documented as “Estimates the number of rows in a given index range” and the actual line is:

btr_cur_search_to_nth_level(index, 0, tuple1, mode1,
					    BTR_SEARCH_LEAF | BTR_ESTIMATE,
					    &cursor, 0,
					    __FILE__, __LINE__, &mtr);

But what is that btr_cur_search_to_nth_level? In the same file, we can find the definition and it is described as “Searches an index tree and positions a tree cursor on a given level”. So basically, it is looking for a row value. But that operation is stalled because it needs to acquire a shared resource that in this case is the buffer pool:

At buf0buf.cc line 1433, the buffer pool is trying to create a lock over a block

rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);

That operation happens inside a function called “buf_block_init” (find it by scrolling up in the code) and is described as “Initializes a buffer control block when the buf_pool is created”. This is actually creating the buffer pool space after an innodb_buffer_pool_size modification and is delayed.

At btr0sea.cc line 1121 the operation used was:

if (!buf_page_get_known_nowait(
			latch_mode, block, BUF_MAKE_YOUNG,
			__FILE__, __LINE__, mtr)) {

That line is inside the function btr_search_guess_on_hash is described as “Tries to guess the right search position based on the hash search info of the index.” So, it is using info from the AHI. And the buf_page_get_known_nowait definition is “This is used to get access to a known database page, when no waiting can be done” which is pretty much self-explanatory.

So what do we have here? Contention on the AHI! Is this a problem? Let’s go back to the original line:

--Thread 139964395677440 has waited at btr0cur.cc line 5889 for 0  seconds the semaphore:

It says that it has waited for 0 seconds. Contention disappeared pretty fast in this case. Also, something very important to notice: this contention appeared only ONCE during the time range monitored. That means that it is not a constant issue, falling into the category of not a problem. It happened once and happened fast.

But what if it was happening often? Then you should take action by increasing the buffer pool size, increasing the amount of AHI partitions, or even disabling the AHI entirely. All of those three options require testing, of course 🙂

In Conclusion

The InnoDB code is pretty well documented to the extent that it helps in finding hot contention spots. Navigating the code is not an impossible task and the GitHub repositories make it pretty fast. Worth to mention is that in not every situation is it evident that the real problem and sometimes a deep understanding of the code comes handy. For those cases, Percona is here to help you to identify the issue and provide a solution.

CDC is becoming more popular nowadays. Many organizations that want to build a realtime/near realtime data pipe and reports are using the CDC as a backbone to powering their real-time reports. Debezium is an opensource product from RedHat and it supports multiple databases (both SQL and NoSQL). Apache Kafka is the core of this.

Managing and scaling Kafka clusters is not easy for everyone. If you are using AWS for your infra then let AWS manage the cluster. AWS has MSK is a managed Kafka service. We are going to configure Debezium with AWS MSK.

Configuration File:

If you are already worked with AWS MSK, then you might familiar with this configuration file. This is similar to the RDS Parameter group but here you need to upload a with your parameter name and its value. If you are using MSK for the first time, then I’ll make you a bit confused. No worries, I’ll give you the steps to do this. You can change this configuration file even after you launched the cluster. But it's a good practice to create the configuration file before launching the cluster. Unfortunately AWS CLI is the only way to create the configuration file.

1. Create a conf file:

create a file in your Desktop or somewhere with the following parameters. For Debezium auto create topic parameter is required. So I’ll use only this one for now. You can add more parameters if you want. Copy and Paste the below content to your conf file called kafka-custom-conf

auto.create.topics.enable = true
zookeeper.connection.timeout.ms = 1000

2. Upload the conf file to AWS

Install AWS CLI on your workstation and run the following command.

NOTE: Im going to use Kafka Version 2.3.1, if you are going to use different Kafka version then change the --kafka-versions value.

aws kafka create-configuration --name "kafka-231-custom-conf" --description "Example configuration description." --kafka-versions "2.3.1" --server-properties file://C:\Users\rbhuv\Desktop\kafka-custom-conf

Once you ran the command it’ll give you the following output.

{
    "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7",
    "CreationTime": "2019-12-20T18:38:17.103000+00:00",
    "LatestRevision": {
        "CreationTime": "2019-12-20T18:38:17.103000+00:00",
        "Description": "Example configuration description.",
        "Revision": 1
    },
    "Name": "kafka-231-custom-conf"
}

3. (Optional) Update existing cluster with this conf file

If you are going to create a new cluster then ignore this step.

Note: if you forgot to take a note of the configuration ARN(from step2), then you can get it from cli aws kafka list-configurations

You need the version for Kafka(its, not Kafka software version, its giving to your cluster by AWS). Run aws kafka list-clusters this will give you the value for a current version like this “CurrentVersion”: “K2EUQ1WTGCTBG2”

Create a configuration info file called configuration-info.jsonwhich contains the ARN of your new conf file.

{
     "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7",
     "Revision": 1
}

Now run the below command to update your Kafka cluster configuration file with the new file.

aws kafka update-cluster-configuration --cluster-arn  "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7" --configuration-info file://C:\Users\rbhuv\Desktop\configuration-info.json   --current-version "K2EUQ1WTGCTBG2"

This will give you the following output.

{
    "ClusterArn": "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7",
    "ClusterOperationArn": "arn:aws:kafka:us-east-1:0000111222333:cluster-operation/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7/519396ad-1df2-46aa-8858-ba2c49f06c3c"
}

Launching the AWS MSK Cluster:

This MSK launch console is very easy and you can select the options as you need. I'm just giving you a few options where you need to focus.

Under the configuration choose Use a custom configuration supporting Apache Kafka 2.2.1
Then you can see the conf file which you created.

Under the encryption, if you are not going to use TLS then select Both TLS encrypted and plaintext traffic allowed In this blog, I'm not using any TLS connections from the connector to Kafka.
The rest of the options are straight forward, you can select them as your requirement.
It’ll take 20 to 25mins to create the cluster.

Click on the cluster name and it’ll take you the details page of the cluster. In the Top Right select View client information. There you can see the Kafka bootstrap servers endpoints and Zookeeper endpoints.

Setup Debezium MySQL Connector on EC2:

Install Java and Confluent Connector binaries:

apt-get update 
sudo apt-get install default-jre

wget -qO - https://packages.confluent.io/deb/5.3/archive.key | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.3 stable main"
sudo apt-get update && sudo apt-get install confluent-hub-client confluent-common confluent-kafka-2.12

Configure the Distributed connector properties:

Bootstrap Servers — Copy the Plaintext value from the MSK client information.
replication.factor — it should be >1

vi  /etc/kafka/connect-distributed.properties

bootstrap.servers=b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092
group.id=debezium-clusterplugin.path=/usr/share/java,/usr/share/confluent-hub-components
offset.storage.replication.factor=2
config.storage.replication.factor=2
status.storage.replication.factor=2

Install Debezium MySQL connector and S3 connector:

confluent-hub install debezium/debezium-connector-mysql:latest
confluent-hub install confluentinc/kafka-connect-s3:latest

Start the connector service:

You can run your confluent connector application via systemctl.

vi /lib/systemd/system/confluent-connect-distributed.service

[Unit]
Description=Apache Kafka - connect-distributed
Documentation=http://docs.confluent.io/
After=network.target

[Service]
Type=simple
User=cp-kafka
Group=confluent
ExecStart=/usr/bin/connect-distributed /etc/kafka/connect-distributed.properties
TimeoutStopSec=180
Restart=no

[Install]
WantedBy=multi-user.target

--Start the service
systemctl enable confluent-connect-distributed
systemctl start confluent-connect-distributed

Configure MySQL Connector:

Note: Before configuring MySQL connector, make sure you have enabled binlog and the MySQL port should be accessible from the Debezium EC2. Also a MySQL User with respective permissions. (refer the Debezium docs).

Create a file mysql.json (this is my example conf file, you can refer Debezium docs for the meaning of these parameters)

Note: From line number 14 onwards I have added some filters to bring only the new data from MySQL to my consumer app. By default, Debezium adds some metadata info along with the MySQL Data, but I don’t want them) and make sure bootstrap servers and MySQL credentials are correct.

{
 "name": "mysql-connector-db01",
 "config": {
  "name": "mysql-connector-db01",
  "connector.class": "io.debezium.connector.mysql.MySqlConnector",
  "database.server.id": "1",
  "tasks.max": "3",
  "database.history.kafka.bootstrap.servers": "b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092",
  "database.history.kafka.topic": "schema-changes.mysql",
  "database.server.name": "mysql-db01",
  "database.hostname": "172.31.84.129",
  "database.port": "3306",
  "database.user": "bhuvi",
  "database.password": "your_strong_pass",
  "database.whitelist": "bhuvi,new,test",
  "internal.key.converter.schemas.enable": "false",
  "transforms.unwrap.add.source.fields": "ts_ms",
  "key.converter.schemas.enable": "false",
  "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "internal.value.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false",
  "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "transforms": "unwrap",
  "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
 }
}

“database.history.kafka.bootstrap.servers” — Kafka Servers IP.
“database.whitelist” — List of databases to get the CDC.
key.converter and value.converter and transforms parameters — By default Debezium output will have more detailed information. But I don’t want all of that information. I'm only interested in to get the new row and the timestamp when its inserted.

If you don’t want to customize anything then just remove everything after the database.whitelist

Register the MySQL Connector:

curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @mysql.json

Check the status:

curl GET localhost:8083/connectors/mysql-connector-db01/status

{
 "name": "mysql-connector-db01",
  "connector": {
    "state": "RUNNING",
    "worker_id": "172.31.44.151:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "172.31.44.151:8083"
    }
  ],
  "type": "source"
}

Test the MySQL Consumer:

Now insert something into any tables in proddb or test (because we have whilelisted only these databases to capture the CDC.

use test; 
create table rohi (
id int), 
fn varchar(10), 
ln varchar(10), 
phone int 
);

insert into rohi values (2, 'rohit', 'ayare','87611');

We can get these values from the Kafker brokers. Listen to the below topic:

mysql-db01.test.rohi
This is the combination of servername.databasename.tablename
servername(you mentioned this in as a server name in mysql json file).

kafka-console-consumer --bootstrap-server b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092 --from-beginning

{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}
{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}
{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}
{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}

It’ll start copying the historical data and start capturing real-time CDC.

Setup S3 Sink connector in All Producer Nodes:

I want to send this data to the S3 bucket. Make sure the Debezium VM is attached with an IAM role that has S3 access to write. So you must have an EC2 IAM role which has access to the target S3 bucket. Or install awscli and configure access and secret key(but its not recommended)

Create s3.json file.

{
 "name": "s3-sink-db01",
 "config": {
  "connector.class": "io.confluent.connect.s3.S3SinkConnector",
  "storage.class": "io.confluent.connect.s3.storage.S3Storage",
  "s3.bucket.name": "searce-00000",
  "name": "s3-sink-db01",
  "tasks.max": "3",
  "s3.region": "us-east-1",
  "s3.part.size": "5242880",
  "s3.compression.type": "gzip",
  "timezone": "UTC",
  "locale": "en",
  "flush.size": "10000",
  "rotate.interval.ms": "3600000",
  "topics.regex": "mysql-db01.(.*)",
  "internal.key.converter.schemas.enable": "false",
  "key.converter.schemas.enable": "false",
  "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "format.class": "io.confluent.connect.s3.format.json.JsonFormat",
  "internal.value.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false",
  "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
  "path.format": "YYYY/MM/dd/HH",
  "partition.duration.ms": "3600000",
  "rotate.schedule.interval.ms": "3600000"
 }
}

"topics.regex": "mysql-db01" - It’ll send the data only from the topics which have mysql-db01 as a prefix. In our case, all the MySQL databases related topics will start with this prefix.
"flush.size" - The data will be uploaded to S3 only after these many number of records stored. Or after "rotate.schedule.interval.ms" this duration.

Register this S3 sink connector:

curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @s3.json

Check the Status:

curl GET localhost:8083/connectors/s3-sink-db01/status |jq

{
  "name": "s3-sink-db01",
  "connector": {
    "state": "RUNNING",
    "worker_id": "172.31.44.151:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "172.31.44.151:8083"
    },
    {
      "id": 1,
      "state": "RUNNING",
      "worker_id": "172.31.44.151:8083"
    },
    {
      "id": 2,
      "state": "RUNNING",
      "worker_id": "172.31.44.151:8083"
    }
  ],
  "type": "sink"
}

Test the S3 sync:

Insert the 10000 rows into the rohi table. Then check the S3 bucket. It’ll save the data in JSON format with GZIP compression. Also in an HOUR wise partitions.

Conclusion:

The MySQL and S3 config files are just referenced and we are using it. If you want more customization or you need any help in understanding the parameter, please refer to the Debezium documentation. Also, in this example blog, I'm doing S3 upload with a micro-batch(every 1hr or 10000 rows added/modified) If you want real-time then modify the config file accordingly.

If you want to do the same setup with Kafka in EC2 instead of AWS MSK please refer to the following link.

Build Production Grade Debezium Cluster With Confluent Kafka

RealTime CDC From MySQL Using AWS MSK With Debezium was originally published in Searce Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

As you could read in this previous post, PHP 7.4 is now completely supporting MySQL 8.0 and the new default authentication plugin.

I wanted to make a summary table providing and overview of all PHP versions and how they supported MySQL 8.0 and the different authentication plugins.

As I am a RPM based distribution user, I’m using the famous repository of remi since a lot of years, and I use it then also to install PHP 7.4.0 and 7.4.1…

I created a new user to test to connect with PHP and then… I was surprised to see that I could not connect to MySQL using caching_sha2_password. Of course I tried to see if my credentials were correct using the MySQL client… and I could connect… Then I tried again my PHP script and new surprise, I could connect !?!

I could connect because the password was cached. If I run FLUSH PRIVILEGES, then the PHP script could not connect anymore.

The error was:

Trying with caching_sha2_password....
PHP Warning:  mysqli::__construct(): (HY000/1045): Access denied for
user 'fred_secure'@'mysql-dc1-2' (using password: YES)

I discussed this with my colleagues. They tried ith the same PHP version and they could not reproduce my error… but they were using Ubuntu.

What’s wrong ?

So I decided to compile from scratch PHP 7.4 on my CentOS 8 box… and… it worked as expected !

After a lot of debugging, testing many openSSL versions and compilation more than 10 times PHP… I was able to find the difference and compile a rpm based on Remi‘s spec file.

The problem was in mysqli.so.

I don’t explain yet why this is a problem, and I already reported this to my colleagues, but the difference between Ubuntu packages and my compiled from scratch version and the one installed from Remi’s repo, is the absence of value for mysqli.default_socket:

mysqli.default_socket => no value => no value

So, I’ve rebuild Remi’s package removing --with-mysql-sock=%{mysql_sock} \ and it worked !

I will now wait for feedback from the developers to understand the reason and see if this is a bug. However, if you want already to use PHP 7.4.1 and MySQL 8.0 on any RedHat based distribution, you will need to have a new php74-php-mysqlnd package.

You can download this one for el8 (RedHat, Oracle Linux and CentOS):

php74-php-mysqlnd-7.4.1-2.el8.remi.x86_64.rpm Download

The package is built in way that you don’t need to update all PHP 7.4 packages, but only the mysqlnd one, like this:

rpm -Uvh php74-php-mysqlnd-7.4.1-2.el8.remi.x86_64.rpm
 Verifying…                          ################# [100%]
 Preparing…                          ################# [100%]
 Updating / installing…
    1:php74-php-mysqlnd-7.4.1-2.el8.rem################# [ 50%]
 Cleaning up / removing…
    2:php74-php-mysqlnd-7.4.1-1.el8.rem################# [100%]

I hope this can help you if you faced some authentication issue with PHP 7.4 and MySQL 8.0.

shared model cloud dba In an earlier post, I discussed the Shared Responsibility Model in the cloud and how it relates to databases. With either IaaS or DBaaS deployments, much of the operational and security burden is shifted away from the DBA to the cloud provider. I also noted that regardless of the deployment method, there is always a need for a DBA. In both cloud deployment models, notice the top user responsibility: customer data.

Let’s review the major tasks of a DBA in the cloud and how that role differs between and IaaS and DBaaS deployment.

DBA Responsibility in the Cloud

Application/Database

With the burden of hardware, OS, and physical security in the cloud, the focus is shifted to the application data and performance. From the application perspective, here are the top areas of focus:

Schema design and review
- Ensuring optimal data types, indexing, etc
Performance tuning
- Queries, system variables
Data archiving
Proactive optimization

While these aspects should always be the responsibility of the DBA, they are often overshadowed by operational tasks. Moving to the cloud allows DBAs to get back to what they should be focused on – the data in the database.

Access

Traditional DBAs were generally tasked with ensuring properly limited access to customer data. In most cases, this is done with some combination of firewalls and user grants. When moving to a cloud deployment, this will remain a responsibility of the DBA.

The cloud provider can provide you with the tools to manage the firewall (i.e. security groups) and within MySQL, you are still required to manage user grants. Note that this is NO DIFFERENT when compared to a traditional on-premise deployment.

Monitoring / Alerting

Finally, proper monitoring and alerting is the responsibility of the DBA. As a best practice, it is advised to capture metrics on everything and alert on as few metrics as needed. This is done to ensure proper trending can be reviewed (capture everything) while not overwhelming the pager with unactionable alerts (minimal alerts).

Some of this metric data is provided by the cloud provider via various monitor portals (i.e. CloudWatch). However, it is up to the DBA to determine the proper thresholds and alerts. This can only be properly achieved after a thoughtful review of all the collected historical metrics. Once baselines are achieved, then proper alerts are able to set up.

I would also note that regardless of the deployment method (IaaS or DBaaS), a tool such as Percona Monitoring and Management (PMM) can be invaluable here.

Cost Control

One of the benefits of the cloud is the elasticity and ease of launching new resources. This can also lead to quite a headache in the finance department. The DBA should always be reviewing the systems to ensure:

You aren’t paying for unused resources
You are properly leveraging Reserved Instances where possible
Systems are tied to the proper teams for billing (i.e. tags, etc)

Understanding your systems and keeping them right-sized is an important role of the DBA. Along with right-sizing your instances, proper capacity planning is also critical in controlling cost.

In a survey of our users, 41% said they had to upgrade 5 times (or more) in the last 2 years, with the cost of the excess moves resulting in a 10x cost increase. Having the time to properly review your data growth patterns is critical when planning for future growth.

DBA Responsibility in IaaS

When considering an IaaS deployment, there are additional tasks that need to be managed by a DBA. Along with managing the customer data and access, here are some additional tasks needed in an IaaS environment:

Managing backups (verification, restoration, retention, etc)
Managing high availability
Patching the guest OS
Installing / updating MySQL
Verifying DR solution

These responsibilities aren’t unique to an IaaS deployment and generally mirror a traditional DBA (minus the hardware support). Automation is key to a successful IaaS deployment and having a DBA that truly understands the data access patterns and performance is critical.

Conclusion

Overall, the need for a DBA doesn’t go away when moving to a cloud environment (even when looking at a DBaaS deployment as noted by AWS). The benefit of the cloud does not lie in eliminating the position of a DBA, but rather allowing the DBA to focus on what is most important to your organization: the data. By removing the operational headaches and burdens from the team, you free up time to ensure the system is running at the optimal level.

Contact Percona today to see how we can help your team if you are moving or considering a move to the cloud. We have experts in both databases and cloud deployments that can advise and help in all phases of migration. How can we help you?

Companies are increasingly embracing database automation and the advantages offered by the cloud. Our new white paper discusses common database scenarios and the true cost of downtime to your business, including the potential losses that companies can incur without a well-configured database and infrastructure setup.

Download “The Hidden Costs of Not Properly Managing Your Databases”

ProxySQL MySQL Latency While we’ve had MySQL Group Replication support in ProxySQL since version 1.3 (native as of v1.4), development has continued in subsequent versions. I’d like to describe a scenario of how latency can affect ProxySQL in a MySQL Group Replication environment, and outline a few new features that might help mitigate those issues. Before we dive into the specifics of the discussion, however, let’s take a quick overview of ProxySQL and Group Replication for those who may not be familiar.

MySQL Group Replication

Similar in functionality to Percona XtraDB Cluster or Galera, MySQL Group Replication is the only synchronous native HA solution for MySQL*. With built-in automatic distributed recovery, conflict detection, and group membership, MySQL GR provides a completely native HA solution for MySQL environments.

ProxySQL

A high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failovers will not affect your application, automatically removing (and adding back) database nodes from a cluster based on definable thresholds.

* There is technically one other native HA solution from Oracle – MySQL NDB Cluster. However, it is outside the scope of this article and not for most general use cases.

Test Case

I recently had an interesting case with a client who was having severe issues with latency due to network/storage stalls at the hypervisor level. The environment is fairly standard, with a single MySQL 8.x GR writer node, and two MySQL 8.x GR passive nodes. In front of the database cluster sits ProxySQL, routing traffic to the active writer and handling failover duties should one of the database nodes become unavailable. The latency always occurred in short spikes, ramping up and then falling off quickly (within seconds).

The latency and I/O stalls from the network/hypervisor were throwing ProxySQL a curveball in determining if a node was actually healthy or not, and the client was seeing frequent failovers of the active writer node – often multiple times per day. To dive a bit deeper into this, let’s examine how ProxySQL determines a node’s health at a high level.

PING
- mysql-monitor_ping_timeout
  - Issued on open connection.
SELECT
- mysql-monitor_groupreplication_healthcheck_timeout
  - Gets the number of transactions a node is behind and identifies which node is the writer.
CONNECT
- mysql-monitor_ping_timeout
  - Will try to open new connections to the host and measure timing.

In a perfect environment, these checks work as intended, and if a node is not reachable, or has fallen too far behind, ProxySQL is able to determine that and remove the node from the cluster. This is known as a hard_offline in ProxySQL, and means the node is removed from the routing table and all traffic to that node stops. If that node is the writer node, ProxySQL will then tee up one of the passive nodes as the active writer, and the failover is complete.

Many of the ProxySQL health checks have multiple variables to control the timeout behavior. For instance, mysql-monitor_ping_timeout sets the maximum timeout for a MySQL node to be unresponsive to a ping, and mysql-monitor_ping_max_failures set up how many times a MySQL node would have to fail a ping check before ProxySQL decides to mark it hard_offline and pull the node out of the cluster.

This wasn’t the case for the Group Replication specific ping checks, however. Prior to version 2.0.7, the options were more limited for Group Replication checks. Note we did not have the same max_failures for Group Replication that we had for standalone MySQL, and we only had the timeout check:

mysql-monitor_groupreplication_healthcheck_timeout

Added in version 2.0.7 was a new variable, giving us the ability to retry multiple times before marking a GR node hard_offline:

mysql-monitor_groupreplication_healthcheck_max_timeout_count

By setting this variable it is possible to have the group replication health check fail a configurable number of times before pulling a node out of the cluster. While this is certainly more of a Band-Aid than an actual resolution, it would allow keeping a ProxySQL + GR environment up and running while work is being done to find the root cause of latency and prevent unnecessary flapping between active and passive nodes during short latency spikes and I/O stalls.

Another similar option is currently being implemented in ProxySQL 2.0.9 for the transactions_behind check. See below:

mysql-monitor_groupreplication_max_transactions_behind

Currently, if group replication max_transactions_behind exceeds the threshold once, the node is evicted from the cluster. The upcoming 2.0.9 release features another additional variable which will define a count for such checks so that max_transactions_behind would have to fail more than once (x number of times) before eviction.

mysql-monitor_groupreplication_max_transactions_behind_count

In Summary

To be clear, the above settings will not fix any latency issues present in your environment. However, since latency can often be a hardware or network issue, and in many cases can take time to track down, these options may stabilize the environment by allowing you to relax ProxySQL’s health checks while the root cause investigation for the latency is underway.

Debezium is providing out of the box CDC solution from various databases. In my last blog post, I have published how to configure the Debezium MySQL connector. This is the next part of that post. Once we deployed the debezium, to we need some kind of monitoring to keep track of whats happening in the debezium connector. Luckily Debezium has its own metrics that are already integrated with the connectors. We just need to capture them using the JMX exporter agent. Here I have written how to monitor Debezium MySQL connector with Prometheus and Grafana. But the dashboard is having the basic metrics only. You can build your own dashboard for more detailed monitoring.

Reference: List of Debezium monitoring metrics

Install JMX exporter in Kafka Distributed connector:

All the connectors are managed by the Kafka connect(Distributed or standalone). In our previous blog, we used Distributed Kafka connect service. So we are going to modify the distributed service binary file.
Download the JMX exporter.

    mkdir/opt/jmx/
    cd /opt/jmx/
    wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar
    mv jmx_prometheus_javaagent-0.12.0.jar jmx-exporter.jar

Create config file.

    vi /opt/jmx/config.yml
    
    startDelaySeconds: 0
    ssl: false
    lowercaseOutputName: false
    lowercaseOutputLabelNames: false
    rules:
    
    pattern : "kafka.connect<type=connect-worker-metrics>(\[^:\]+):"
    name: "kafka_connect_connect_worker_metrics_$1"
    
    pattern : "kafka.connect<type=connect-metrics, client-id=(\[^:\]+)><>(\[^:\]+)"
    name: "kafka_connect_connect_metrics_$2"
    labels:
    client: "$1"
    
    pattern: "debezium.(\[^:\]+)<type=connector-metrics, context=(\[^,\]+), server=(\[^,\]+), key=(\[^>\]+)><>RowsScanned"
    name: "debezium_metrics_RowsScanned"
    labels:
    plugin: "$1"
    name: "$3"
    context: "$2"
    table: "$4"
    
    pattern: "debezium.(\[^:\]+)<type=connector-metrics, context=(\[^,\]+), server=(\[^>\]+)>(\[^:\]+)"
    name: "debezium_metrics_$4"
    labels:
    plugin: "$1"
    name: "$3"
    context: "$2"

Add the JMX export to the Kafka connect binary File.

    vi /usr/bin/connect-distributed
    
    -- Find this line below export CLASSPATH
    exec $(dirname $0)/kafka-run-class $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
    
    --Replace with
    exec $(dirname $0)/kafka-run-class $EXTRA_ARGS -javaagent:/opt/jmx/jmx-exporter.jar=7071:/opt/jmx/config.yml org.apache.kafka.connect.cli.ConnectDistributed "$@"

Restart the Distributed Connect Service.

    systemctl restart confluent-connect-distributed

Verify the JMX Agent installation.

    netstat -tulpn | grep 7071
    tcp6       0      0 :::7071                 :::*                    LISTEN      2885/java

Get the debezium metrics.

    localhost:7071 | grep debezium
    :-debezium_metrics_NumberOfDisconnects{context="binlog",name="mysql-db01",plugin="mysql",} 0.

You can these metrics in your browser as well.

    http://ip-of-the-connector-vm:7071/metrics

Install Prometheus

Im using a separate server for Prometheus and Grafana.

Create a user for Prometheus:

    sudo useradd --no-create-home --shell /bin/false prometheus

Create Directories for Prometheus:

    sudo mkdir /etc/prometheus
    sudo mkdir /var/lib/prometheus
    sudo chown prometheus:prometheus /etc/prometheus
    sudo chown prometheus:prometheus /var/lib/prometheus

Download the Prometheus binary files:

    cd /tmp
    wget https://github.com/prometheus/prometheus/releases/download/v2.15.0/prometheus-2.15.0.linux-amd64.tar.gz
    tar -zxvf prometheus-2.15.0.linux-amd64.tar.gz

Copy the binary files to respective locations:

    cd prometheus-2.15.0.linux-amd64
    cp prometheus /usr/local/bin/
    cp promtool /usr/local/bin/
    sudo chown prometheus:prometheus /usr/local/bin/prometheus
    sudo chown prometheus:prometheus /usr/local/bin/promtool
    cp -r consoles /etc/prometheus
    cp -r console_libraries /etc/prometheus
    sudo chown -R prometheus:prometheus /etc/prometheus/consoles
    sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries

Create a Prometheus config file:

    vi  /etc/prometheus/prometheus.yml
    
    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: 'prometheus'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']

Set permission for config file:

    sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Create a Prometheus systemctl file:

    vi /etc/systemd/system/prometheus.service
    
    [Unit]
    Description=Prometheus
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=prometheus
    Group=prometheus
    Type=simple
    ExecStart=/usr/local/bin/prometheus \
        --config.file /etc/prometheus/prometheus.yml \
        --storage.tsdb.path /var/lib/prometheus/ \
        --web.console.templates=/etc/prometheus/consoles \
        --web.console.libraries=/etc/prometheus/console_libraries
    
    [Install]
    WantedBy=multi-user.target

Start the Prometheus Service:

    sudo systemctl daemon-reload
    sudo systemctl start prometheus
    sudo systemctl enable prometheus

Add Debezium MySQL connector metrics to Prometheus:

    vi  /etc/prometheus/prometheus.yml
    
    
      - job_name: debezium
        scrape_interval: 5s
        static_configs:
          - targets:
              - debezium-node-ip:7071

Restart the Prometheus service:

    sudo systemctl restart prometheus

Check the status:

In your browser Open the below URL.

    http://IP_of-prometheus-ec2:9090/graph

Install Grafana:

    wget https://dl.grafana.com/oss/release/grafana_6.5.2_amd64.deb
    sudo dpkg -i grafana_6.5.2_amd64.deb
    sudo systemctl daemon-reload
    sudo systemctl start grafana-server

It’ll start listening to the port 3000. The default username and password admin/admin. You can change once you logged in.

    http://grafana-server-ip:3000

Add the Debezium MySQL Dashboard:

This dashboard is taken from the official Debezium’s example repo. But they gave this for MSSQL Server. With some changes and fixes, we can use the same for MySQL and other databases. I made it as a template.
In grafana add the Prometheus datasource.

    http://grafana-ip:3000/datasources

Click on Add Data source, select Prometheus.

Name: Prometheus
URL: localhost:9090 (I have installed grafana and Prometheus on the same server, If you have different server for Prometheus, use that IP instead of localhost).

Click on Save & Test.

You’ll get a pop-up message that its is connected.

Now go to the dashboards page and import the Template JSON.

    http://grafan-ip:3000/dashboards

Click on Import button.

Copy the Template JSON file from here. Paste it or download the JSON file and choose the upload button. Now the dashboard is ready. You can see a few basic metrics.

Contribution:

Debezium is a great platform for who wants to do real-time analytics. But in terms of monitoring, still, I feel it should get more contribution. This template is just a kickstart. We can build a more detailed monitoring dashboard for the debezium connectors. Please feel free to contribute to repo. Pull requests are welcome. Lets make the debezium more powerful.

MySQL query rewrite for ClickHouse using ProxySQL

Introduction

ProxySQL in September 2017 announced support for ClickHouse as backend. ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks. ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. To support ClickHouse as a backend, ProxySQL acts as a data bridge between MySQL protocol and ClickHouse protocol, allowing MySQL clients to execute queries in ClickHouse through it. ClickHouse’s SQL query syntax is different than MySQL’s syntax, and migrating application from MySQL to ClickHouse isn’t just a matter of changing connections endpoint but it also requires modifying some queries. This needs development time, but not always possible. One of ProxySQL most widely used feature is indeed the ability of rewriting queries, so often it is just a matter of writing the right query rules. In this blog post we have explained step-by-step MySQL query rewrite for ClickHouse using ProxySQL:

How to implement ProxySQL query rewrite for ClickHouse ?

The below is MySQL query we are considering for query rewrite:

SELECT COUNT(`id`), FROM_UNIXTIME(`created`, '%Y-%m') AS `date` FROM `tablename` GROUP BY FROM_UNIXTIME(`created`, '%Y-%m')

ClickHouse doesn’t support FROM_UNIXTIME, but it supports toDate and toTime.
ClickHouse also supports toYear and toMonth , useful to format the date the same FROM_UNIXTIME does.

Therefore, it is possible to rewrite the query as:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

To perform the above rewrite, we will need two rules, one for the first FROM_UNIXTIME, and one for the second one. Or we can just use one rewrite rules to replace FROM_UNIXTIME(created, ‘%Y-%m’) no matter if on the retrieved fields or in the GROUP BY clause, generating the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created))));

Does it look great? No, not yet!
For the month of March, concat(toString(toYear(toDate(created))), ‘-‘, toString(toMonth(toDate(created)))) will return 2018-3 : not what the application was expecting, as MySQL would return 2018-03 . The same applies for all the first 9 months of each year.
Finally, we rewrote the query as the follow, and the application was happy:

SELECT COUNT(`id`), substring(toString(toDate(created)),1,7) AS `date`
FROM `tablename`
GROUP BY substring(toString(toDate(created)),1,7);

Note: because of the datatypes conversions that ClickHouse needs to perform in order to execute the above query, its execution time is about 50% slower than executing the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

Architecting using two ProxySQL

Great, we now know how to rewrite the query!
Although, the ClickHouse module in ProxySQL doesn’t support query rewrite. The ClickHouse module in ProxySQL is only responsible to transform data between MySQL and ClickHouse protocol, and viceversa.

Therefore the right way of achieving this solution is to configure two ProxySQL layers, one instance responsible for rewriting the query and sending the rewritten query to the second ProxySQL instance, this one responsible for executing the query (already modified) on ClickHouse.

Architecting using only one ProxySQL

Does the above architecture seems complex? Not really, it is reasonable straightforward.
Can it be improved?
As you can see from the previous chart, the ClickHouse module and the MySQL module listen on different ports. The first ProxySQL instance is receiving traffic on port 6033, and sending traffic on the second PorxySQL instance on port 6090.
Are two instances really required? The answer is no.
In fact, a single instance can receive MySQL traffic on port 6033, rewrite the query, and send the rewritten query to itself on port 6090, to finally execute the rewritten query on ClickHouse.

This diagram describes the architecture:

Configuration

For reference, below is the step to configure one single ProxySQL to send traffic to ClickHouse, and use itself as a backend.

Create ClickHouse user:

INSERT INTO clickhouse_users (username,password) VALUES ('clicku','clickp');
LOAD CLICKHOUSE USERS TO RUNTIME;
SAVE CLICKHOUSE USERS TO DISK;

Create MySQL user (same as ClickHouse):

INSERT INTO mysql_users(username,password) SELECT username, password FROM clickhouse_users;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;

Configure ProxySQL itself as a backend for MySQL traffic:

INSERT INTO mysql_servers(hostname,port) VALUES ('127.0.0.1',6090);
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;

Create a query rule for rewriting queries:

INSERT INTO mysql_query_rules (active,match_pattern,replace_pattern,re_modifiers) VALUES 
(1,"FROM_UNIXTIME\(`created`, '%Y-%m'\)", 'substring(toString(toDate(created)),1,7)',"CASELESS,GLOBAL");
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

This is a very simple example to demonstrate how to perform query rewrite from MySQL to ClickHouse using just one ProxySQL instance. In a real world scenarios you will need to create more rules based on your own queries.

Conclusion

Not only ProxySQL allows to send queries to ClickHouse, but it also allows to rewrite queries to solve issues related to different SQL syntax and available functions.
To achieve this, ProxySQL uses its ability to use itself as a backend: rewrite the query in the MySQL module, and execute it in the ClickHouse module.

References

The post ClickHouse and ProxySQL queries rewrite (Cross-post from ProxySQL) appeared first on The WebScale Database Infrastructure Operations Experts.

One of our support customers approached us with the following problem the other day:

mysql> CREATE TABLE child_table (
`id` int unsigned auto_increment,
`column1` varchar(64) NOT NULL,
parent_id int unsigned NOT NULL,
PRIMARY KEY (`id`),
CONSTRAINT FOREIGN KEY (parent_id) REFERENCES parent_table (id));
ERROR 1215 (HY000): Cannot add foreign key constraint

They could not create a table with an FK relation! So, of course, we asked to see the parent table definition, which was:

CREATE TABLE `parent_table` (
  `id` int unsigned auto_increment, 
  `column1` varchar(64) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  
PARTITION BY HASH (id) 
PARTITIONS 4;

The parent table is partitioned! This immediately explained the problem; partitioned tables can not be part of an FK relationship, as described (in point 10) here – MySQL Error Code 1215: “Cannot add foreign key constraint”.

Quoting the official MySQL manual for completeness:

Partitioned tables using the InnoDB storage engine do not support foreign keys. More specifically, this means that the following two statements are true:

No definition of an InnoDB table employing user-defined partitioning may contain foreign key references; no InnoDB table whose definition contains foreign key references may be partitioned.

No InnoDB table definition may contain a foreign key reference to a user-partitioned table; no InnoDB table with user-defined partitioning may contain columns referenced by foreign keys.

So, after verifying it was impossible to guarantee referential integrity using CONSTRAINTs, we turned to an old alternative from MyISAM era of MySQL: using a set of triggers that would intercept the DML statements before they execute, and verify if the parent row actually exists.

So for this, we would create child_table without the constraint:

CREATE TABLE child_table (
`id` int unsigned auto_increment,
`column1` varchar(64) NOT NULL,
parent_id int unsigned NOT NULL,
PRIMARY KEY (`id`));

And then we create 4 triggers: BEFORE INSERT and BEFORE UPDATE on the child table, and BEFORE UPDATE and BEFORE DELETE on the parent table.

DELIMITER //
DROP TRIGGER IF EXISTS PARTITIONED_TABLE_CHECK_INS //
CREATE TRIGGER PARTITIONED_TABLE_CHECK_INS BEFORE INSERT ON child_table FOR EACH ROW
BEGIN
    DECLARE fk_check INT;
    DECLARE fk_error_msg VARCHAR(200);
    IF (@DISABLE_TRIGGERS IS NULL) THEN
      SELECT COUNT(*) FROM parent_table WHERE id=new.parent_id INTO fk_check;
      IF fk_check < 1 THEN
          SELECT CONCAT("Foreign key constraint fails for table `parent_table`, can't find row matching `id='", new.parent_id, "`") INTO fk_error_msg;
          SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = fk_error_msg;
      END IF;
    END IF;
END
//

DROP TRIGGER IF EXISTS PARTITIONED_TABLE_CHECK_UPD //
CREATE TRIGGER PARTITIONED_TABLE_CHECK_UPD BEFORE UPDATE ON child_table FOR EACH ROW
BEGIN
    DECLARE fk_check INT;
    DECLARE fk_error_msg VARCHAR(200);
    IF (@DISABLE_TRIGGERS IS NULL) THEN
      SELECT COUNT(*) FROM parent_table WHERE id=new.parent_id INTO fk_check;
      IF fk_check < 1 THEN
          SELECT CONCAT("Foreign key constraint fails for table `child_table`, can't find row matching `id='", new.parent_id, "`") INTO fk_error_msg;
          SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = fk_error_msg;
      END IF;
    END IF;
END
//

-- DELETE is checked the other way around and the trigger has to be attached to parent_table (I kept naming the same for consistency) 
DELIMITER //
DROP TRIGGER IF EXISTS PARTITIONED_TABLE_CHECK_DEL //
CREATE TRIGGER PARTITIONED_TABLE_CHECK_DEL BEFORE DELETE ON parent_table FOR EACH ROW
BEGIN
    DECLARE fk_check INT;
    DECLARE fk_error_msg VARCHAR(200);
    IF (@DISABLE_TRIGGERS IS NULL) THEN
      SELECT COUNT(*) FROM child_table WHERE parent_id=old.id INTO fk_check;
      IF fk_check > 0 THEN
          SELECT CONCAT("Foreign key constraint fails for table `parent_table`, child table has ", fk_check," row(s) matching condition `parent_id='", old.id, "`") INTO fk_error_msg;
          SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = fk_error_msg;
      END IF;
    END IF;
END
//

-- Check UPDATE on parent_id.id; If PK differs we must check it's not referenced
DELIMITER //
DROP TRIGGER IF EXISTS PARTITIONED_TABLE_CHECK_PARENT_PK_UPDATE //
CREATE TRIGGER PARTITIONED_TABLE_CHECK_PARENT_PK_UPDATE BEFORE UPDATE ON parent_table FOR EACH ROW
BEGIN
	DECLARE fk_check INT;
	DECLARE fk_error_msg VARCHAR(200);
	IF (@DISABLE_TRIGGERS IS NULL) THEN
  	IF old.id <> new.id THEN
    	SELECT COUNT(*) FROM child_table WHERE parent_id=old.id INTO fk_check;
    	IF fk_check > 0 THEN
        	SELECT CONCAT("Foreign key constraint fails for table `parent_table`, child table has ", fk_check," row(s) matching condition `parent_id='", old.id, "`") INTO fk_error_msg;
        	SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = fk_error_msg;
    	END IF; 	 
  	END IF;
	END IF;
END
//

DELIMITER ;

Testing the Triggers:

Populate parent_table:

mysql> INSERT INTO parent_table (id, column1) VALUES (1, "column1");
Query OK, 1 row affected (0.03 sec)

Test insert:

-- Insert is valid
mysql> INSERT INTO child_table (id, column1, parent_id) VALUES (null, "value1", 1);
Query OK, 1 row affected (0.01 sec)

-- Insert fails with FK check
mysql> INSERT INTO child_table (id, column1, parent_id) VALUES (null, "value2", 2);
ERROR 1644 (45000): Foreign key constraint fails for table `parent_table`, can't find row matching `id='2`

So far so good! For valid child ids, inserts are accepted, and for invalid child ids, trigger rejects the insert.

Test Update:

--Test invalid update on child:
mysql> UPDATE child_table SET parent_id='2' WHERE parent_id='1';
ERROR 1644 (45000): Foreign key constraint fails for table `child_table`, can't find row matching `id='2`

-- Test invalid update on parent
mysql> UPDATE parent_table SET id=5;
ERROR 1644 (45000): Foreign key constraint fails for table `parent_table`, child table has 1 row(s) matching condition `parent_id='1`

-- Test valid update on parent and child
mysql> INSERT INTO parent_table VALUES (10, "column1");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO child_table VALUES (10, "column1", 1);
Query OK, 1 row affected (0.01 sec)

mysql> UPDATE parent_table SET id = 9 WHERE id = 10;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE child_table SET parent_id = 9 WHERE id = 10;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Test Delete:

-- Invalid delete
mysql> DELETE FROM parent_table WHERE id=1 ;
ERROR 1644 (45000): Foreign key constraint fails for table `parent_table`, child table has 1 row(s) matching condition `parent_id='1`

For both delete and update, we also tested trigger is working as expecting and checking FK integrity.

Insert new row on parent_table which we should be able to delete without failing the “constraint” (as it will have no child rows) :

-- Test valid delete
INSERT INTO parent_table (id, column1) VALUES (2, "column2");
Query OK, 1 row affected (0.03 sec)
mysql> DELETE FROM parent_table WHERE id=2 ;
Query OK, 1 row affected (0.03 sec)

Unfortunately, the non-standard REPLACE INTO is not compatible with the above method, as it actually consists of two operations – a DELETE and a subsequent INSERT INTO, and doing the DELETE on the parent table for a referenced row would trigger the FK error:

mysql> REPLACE INTO parent_table (id, column1) VALUES (1, "column2");
ERROR 1644 (45000): Foreign key constraint fails for table `parent_table`, child table has 1 row(s) matching condition `parent_id='1`

REPLACE INTO the child_table should work without issues.

On the other hand, INSERT…ON DUPLICATE KEY CHECK will work as expected as the trigger on the UPDATE will work correctly and prevent breaking referential integrity.

For convenience FK triggers can be disabled on the session; This would be the equivalent of SET foreign_key_checks=0. You can disable by setting the following variable:

mysql > SET @DISABLE_TRIGGERS=1;
Query OK, 0 rows affected (0.00 sec)

Disclaimer:

The above is a proof of concept and while it should work for the vast majority of uses, there are two cases that are not checked by the triggers and will break referential integrity: TRUNCATE TABLE parent_table and DROP TABLE parent_table, as it will not execute the DELETE trigger and hence will allow all child rows to become invalid at once.

And in general, DDL operations which can break referential integrity (for example ALTER TABLE modifying column type or name) are not handled as these operations don’t fire TRIGGERs of any kind, and also it relies on you writing the correct query to find the parent rows (for example if you have a parent table with a multi-column primary key, you must check all the columns in the WHERE condition of the triggers)

Also, keep in mind added performance impact; Triggers will add overhead, so please make sure to measure impact on the response time of the DML in these two tables. Please test thoroughly before deploying to production!

MySQL InnoDB Cluster has introduced by the MySQL team for the High Availability ( HA ) purpose . It provides a complete high availability solution for MySQL.

Alright, I am planning to right the series of the blogs about the InnoDB Cluster configurations / Management with MySQL Shell / Monitoring etc …

In this blog I am going to show the InnoDB Cluster configuration with three nodes .

What is InnoDB Cluster ?

MySQL InnoDB Cluster is the Combination of,

MySQL shell
Group Replication ( GR )
MySQL Router

Lab Environment :

I have prepared my lab with three servers,

OS : Centos 7.7
MySQL 8.0.18 ( latest version )

The server details are ,

192.168.91.11 ( hostname : sakthilabs11 )
192.168.91.12 ( hostname : sakthilabs12 )
192.168.91.13 ( hostname : sakthilabs13 )

Step 1 :

Need to allow the complete communication between the cluster nodes based on the hostname and IP . The below entry needs to be made on all the cluster nodes individually .

[root@sakthilabs11 ~]# cat /etc/hosts | grep 192
192.168.33.11 sakthilabs11 sakthilabs11
192.168.33.12 sakthilabs12 sakthilabs12
192.168.33.13 sakthilabs13 sakthilabs13

Step 2 :

In this step, we need to prepare the MySQL server for the InnoDB Cluster . The below step needs to be individually executed on all the cluster nodes .

cmd : dba.configureLocalInstance("username@userhost:3306");

When executing the above command , it will print the informations and ask the actions to configure the InnoDB Cluster . I just highlight them in the below output section .

output ;

MySQL localhost:33060+ ssl JS > dba.configureLocalInstance("root@localhost:3306");
Please provide the password for 'root@localhost:3306': *
Save password for 'root@localhost:3306'? [Y]es/[N]o/Ne[v]er (default No): y
Configuring local MySQL instance listening at port 3306 for use in an InnoDB cluster…

1) Create remotely usable account for 'root' with same grants and password
2) Create a new admin account for InnoDB cluster with minimal required grants
3) Ignore and continue
4) Cancel

Please select an option [1]: 2
Please provide an account name (e.g: icroot@%) to have it created with the necessary
privileges or leave empty and press Enter to cancel.
Account Name: InnoDBCluster
Password for new account: ***
Confirm password: ***

NOTE: Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable | Current Value | Required Value | Note |
+--------------------------+---------------+----------------+--------------------------------------------------+
| binlog_checksum | CRC32 | NONE | Update the server variable |
| enforce_gtid_consistency | OFF | ON | Update read-only variable and restart the server |
| gtid_mode | OFF | ON | Update read-only variable and restart the server |
| server_id | 1 | | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.
Do you want to perform the required configuration changes? [y/n]: y
Do you want to restart the instance after configuring it? [y/n]: y

Cluster admin user 'InnoDBCluster'@'%' created.
Configuring instance…
The instance 'localhost:3306' was configured for InnoDB cluster usage.
Restarting MySQL…
NOTE: MySQL server at localhost:3306 was restarted.

Step 3 :

After prepare all the nodes , need to login any one of the MySQL Shell with the InnoDB Cluster account ( which was created during the preparing phase ) .

cmd : shell.connect('InnoDBCluster@192.168.33.23:3306');

output :

MySQL localhost:33060+ ssl JS > shell.connect('InnoDBCluster@192.168.33.11:3306');
Creating a session to 'InnoDBCluster@192.168.33.11:3306'
Please provide the password for 'InnoDBCluster@192.168.33.11:3306': ***
Save password for 'InnoDBCluster@192.168.33.11:3306'? [Y]es/[N]o/Ne[v]er (default No): y
Fetching schema names for autocompletion… Press ^C to stop.
Closing old connection…
Your MySQL connection id is 9
Server version: 8.0.18 MySQL Community Server - GPL
No default schema selected; type \use to set one.

MySQL 192.168.33.11:3306 ssl JS >

Step 4 :

Create your first node of the InnoDB Cluster .

cmd : cluster = dba.createCluster('first_InnoDB_cluster');

output :

MySQL 192.168.33.11:3306 ssl JS > cluster = dba.createCluster('first_InnoDB_cluster');
A new InnoDB cluster will be created on instance '192.168.33.11:3306'.

Validating instance at 192.168.33.11:3306…

This instance reports its own address as sakthilabs11:3306

Instance configuration is suitable.
Creating InnoDB cluster 'first_InnoDB_cluster' on '192.168.33.11:3306'…
Adding Seed Instance…
Cluster successfully created. Use Cluster.addInstance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.
MySQL 192.168.33.11:3306 ssl JS >

Step 5 :

Now we have successfully created the single node cluster , Have to add the other nodes as well . When adding the other nodes, it will ask the recovery method, we need to choose them . Clone plugin is the default one .

cmd : cluster.addInstance('InnoDBCluster@192.168.33.12:3306');

output :

MySQL 192.168.33.11:3306 ssl JS > cluster.addInstance('InnoDBCluster@192.168.33.12:3306');
Please provide the password for 'InnoDBCluster@192.168.33.12:3306': ***
Save password for 'InnoDBCluster@192.168.33.12:3306'? [Y]es/[N]o/Ne[v]er (default No): y

Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone): clone
Validating instance at 192.168.33.12:3306…

This instance reports its own address as sakthilabs12:3306

Instance configuration is suitable.
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster…

Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

Waiting for clone to finish…
NOTE: 192.168.33.12:3306 is being cloned from sakthilabs11:3306
** Stage DROP DATA: Completed
** Clone Transfer
FILE COPY ############################################################ 100% Completed
PAGE COPY ############################################################ 100% Completed
REDO COPY ############################################################ 100% Completed
** Stage RECOVERY: \
NOTE: 192.168.33.12:3306 is shutting down…

Waiting for server restart… ready

sakthilabs12:3306 has restarted, waiting for clone to finish…

Clone process has finished: 59.55 MB transferred in about 1 second (~59.55 MB/s)

Incremental distributed state recovery is now in progress.

Waiting for distributed recovery to finish…
NOTE: '192.168.33.12:3306' is being recovered from 'sakthilabs11:3306'

Distributed recovery has finished

The instance '192.168.33.12:3306' was successfully added to the cluster.

Similarly , I have added the third node as well .

Finally ,

We can check the cluster status with the below command .

cmd : cluster.status();

output :

MySQL 192.168.33.11:3306 ssl JS > cluster.status();
{
"clusterName": "first_InnoDB_cluster",
"defaultReplicaSet": {
"name": "default",
"primary": "sakthilabs11:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"sakthilabs11:3306": {
"address": "sakthilabs11:3306",
"mode": "R/W",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.18"
},
"sakthilabs12:3306": {
"address": "sakthilabs12:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.18"
},
"sakthilabs13:3306": {
"address": "sakthilabs13:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.18"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "sakthilabs11:3306"
}

Hope this blog will helps someone who is trying to learning the MySQL InnoDB Cluster . I will be coming back with new blog soon .

Thanks !!

Use ClickHouse like MySQL with ProxySQL

Introduction

We have several customers on ClickHouse now for both columnar database analytics and archiving MySQL data, You can access data from ClickHouse with clickhouse-client but this involves some learning and also limitations technically. Our customers are very comfortable using MySQL so they always preferred a MySQL client for ClickHouse query analysis and reporting, Thankfully ProxySQL works as a optimal bridge between ClickHouse and MySQL client, This indeed was a great news for us and our customers worldwide. This blog post is about how we can use MySQL client with ClickHouse.

Installation

https://github.com/sysown/proxysql/releases/ (**Download the package starting with clickhouse )
Dependencies installation:
- yum -y install perl-DBD-MySQL

Start ProxySQL once completed installation successfully.

  # The default configuration file is this: 
  /etc/proxysql.cnf 
  # There is no such data directory by default: 
  mkdir / var / lib / proxysql 
  # start up 
  proxysql --clickhouse-server 
  # ProxySQL will default to daemon mode in the background

Creating ClickHouse user

Create a user for ClickHouse in the ProxySQL with password, The password is not configured for ClickHouse but for accessing ProxySQL:

# ProxySQL port is 6032, the default username and password are written in the configuration file 
 root@10.xxxx: / root # mysql -h 127.0.0.1 -P 6032 -uadmin -padmin 
 Welcome to the MariaDB monitor. Commands end with; or \ g. 
 Your MySQL connection id is 3 
  Server version: 5.6.81 (ProxySQL Admin Module) 
  Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. 
  Type 'help;' or '\ h' for help. Type '\ c' to clear the current input statement. 
  MySQL [(none)]> INSERT INTO clickhouse_users VALUES ('chuser', 'chpasswd', 1,100); 
  Query OK, 1 row affected (0.00 sec) 
  MySQL [(none)] > select * from clickhouse_users; 
  + ---------- + ---------- + -------- + ----------------- + 
  | username | password | active | max_connections | 
  + ---------- + ---------- + -------- + ----------------- + 
  | chuser | chpasswd | 1 | 100 | 
  + ---------- + ---------- + -------- + ----------------- + 
  1 row in set (0.00 sec) 
  MySQL [(none)]> LOAD CLICKHOUSE USERS TO RUNTIME; 
  Query OK, 0 rows affected (0.00 sec) 
  MySQL [(none)]> SAVE CLICKHOUSE USERS TO DISK; 
  Query OK, 0 rows affected (0.00 sec)

Connecting to ClickHouse from MySQL Client

By default ProxySQL opens the port 6090 to receive user access to ClickHouse:

  # Use username and password above 
  # If it is a different machine, remember to change the IP 
  root@10.xxxx: / root # mysql -h 127.0.0.1 -P 6090 -uclicku -pclickp --prompt "ProxySQL-To-ClickHouse>" 
  Welcome to the MariaDB monitor. Commands end with; or \ g. 
  Your MySQL connection id is 64 
  Server version: 5.6.81 (ProxySQL ClickHouse Module) 
  Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. 
  Type 'help;' or '\ h' for help. Type '\ c' to clear the current input statement. 
  ProxySQL-To-ClickHouse >

Querying ClickHouse like MySQL

 MySQL [(none)] > select version (); 
+ ------------------- + 
| version | 
+ ------------------- + 
| 5.6.81-clickhouse | 
+ ------------------- + 
1 row in set (0.00 sec) 
MySQL [(none)] > select now (); 
+ --------------------- + 
| now () | 
+ --------------------- + 
| 2019-12-25 20:17:14 | 
+ --------------------- + 
1 row in set (0.00 sec) 
MySQL [(none)] > select today (); 
+ ------------ + 
| today () | 
+ ------------ + 
| 2019-12-25 | 
+ ------------ + 
1 row in set (0.00 sec) 
# Our table is over 55 billion 
ProxySQL-To-ClickHouse > select count (*) from mysql_audit_log_data; 
+ ------------- + 
| count () | 
+ ------------- + 
| 539124837571 | 
+ ------------- + 
1 row in set (8.31 sec)

Limitations

This ProxySQL solution works only when it is on the local ClickHouse (Note.- ClickHouse instance cannot have password in this ecosystem / recommended solution)
ProxySQL query rewrite limitations – The simple queries work seamless, The complex query rewrite are quite expense and there might some levels of SQL semantics limitations

Conclusion – ProxySQL Version 2.0.8 new features and enhancements

Changed default max_allowed_packet from 4M to 64M
Added support for mysqldump 8.0 and Admin #2340
Added new variable mysql-aurora_max_lag_ms_only_read_from_replicas : if max_lag_ms is used and the writer is in the reader hostgroup, the writer will be excluded if at least N replicas are good candidates.
Added support for unknown character set , and for collation id greater than 255 #1273
Added new variable mysql-log_unhealthy_connections to suppress messages related to unhealty clients connections being closed
Reimplemented rules_fast_routing using khash
Added support for SET CHARACTERSET #1692
Added support for same node into multiple Galera clusters #2290
Added more verbose output for error 2019 (Can’t initialize character set) #2273
Added more possible values for mysql_replication_hostgroups.check_type #2186
- read_only | innodb_read_only
- read_only & innodb_read_only
Added support and packages for RHEL / CentOS 8

References:

The post How to use ProxySQL to work on ClickHouse like MySQL ? appeared first on The WebScale Database Infrastructure Operations Experts.

MySQL Docker Containers Async This blog discusses a few concepts about Docker and how we can use it to run a MySQL async replication environment. Docker is a tool designed to make it easier for developers and sysadmins to create/develop, configure, and run applications with containers. The container allows us to package all parts of the application it needs, such as libraries, dependencies like code, configurations, and runtime engine. Docker runtime containers are platform-independent so the package created can be shipped one platform to another platform.

Dockerhub is the repository where you can find containerized docker images for applications like MySQL, Percona Server for MySQL, and MariaDB. Using the example below, I will show you how to set up a docker container from the Pecona Server for MySQL docker image that I download from the repository.

Custom Network Instead of the Default

First, I will create a network that my docker instances will use. I will be using a user-defined network instead of the default one. It is recommended to use the user-defined bridge networks to control which containers can communicate with each other. Docket daemon automatically takes care of DNS name resolution. By creating your own network, every single container using that network will have DNS resolution automagically.

MacBook-Pro: $ docker network create --driver bridge isolated_nw
f8cd8f09b4042b39b04a6a43fd9dc71507af22dfd6ee2817fa80360577b24d6f
MacBook-Pro: $

Storage for Persisting the Data

The second step is to provision the storage which my docker instances will be using. In docker, storage can be provisioned in two ways, by using a bind mount or by using a docker volume. Bind mounts are dependent on the directory structure of the host machine while docker volumes are completely managed by Docker. In my example, I am using bind mounts for the primary instance and docker volumes for the replica to illustrate how either of these options can be used.

MacBook-Pro: $ mkdir /Users/myuser/mysql_primary_data

Configuration File for the Primary Instance

I will proceed with the creation of the configuration file which my primary instance will be using.

MacBook-Pro: $ cat /Users/myuser/primaryconf/my-custom.cnf
[mysqld]
server-id = 100
log_bin

Provisioning the Primary Instance

In the fourth step, we will provision the primary instance. The docker run command will download the latest percona server image if the image does not already exist in the local repository. In this example, we already have the downloaded image so the docker run does not need to download it again.

MacBook-Pro: $ docker run --network=isolated_nw -v /Users/myuser/mysql_primary_data:/var/lib/mysql -v /Users/myuser/primaryconf:/etc/percona-server.conf.d/ -p 3308:3306 -p 33061:33060 --name percona_primary -e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD=MysqlTestInstance -d percona --default-authentication-plugin=mysql_native_password
deb40d6941db74d845cbc5ec550572d37d4763740e3a72c015ada0c7520a0fd7
MacBook-Pro: $

I intend to set up an async replication environment, so I will get the binary log details from the primary instance.

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3308 -e "show master status;"
Enter password:
+-------------------------+----------+--------------+------------------+-------------------+
| File                    | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------------+----------+--------------+------------------+-------------------+
| deb40d6941db-bin.000003 |      154 |              |                  |                   |
+-------------------------+----------+--------------+------------------+-------------------+
MacBook-Pro: $

For setting the replication we need a user and the below command connects the primary instance and grants the privilege.

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3308 -e "GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%' IDENTIFIED BY 'replpass';"
Enter password:
MacBook-Pro: $

Docker Volume for Replication Storage

Before I create my replica instance, I will provision the docker volume which my replica instance will use for storage.

MacBook-Pro: $ docker volume create mysqlreplicavol
mysqlreplicavol
MacBook-Pro: $

Configuration File for the Replica Instance

Create a custom MySQL configuration file for the replica instance.

MacBook-Pro: $ cat /Users/myuser/replicaconf/my-custom.cnf
[mysqld]
server-id = 101

Provisioning the Replica Instance

The next step is to set up a replica instance using the docker volume that I created above.

MacBook-Pro: $ docker run --network=isolated_nw -v mysqlreplicavol:/var/lib/mysql -v /Users/myuser/replicaconf:/etc/percona-server.conf.d/ -p 3309:3306 -p 33062:33060 --name percona_replica -e MYSQL_ROOT_HOST='%' -e MYSQL_ROOT_PASSWORD=MysqlTestInstance -d percona --default-authentication-plugin=mysql_native_password
98c109998a522a51c4ceca7d830a1f9af2abdd408c5e1d6d12cfd55af13d170d
MacBook-Pro: $

To set up the replication, apply the change master command in the replica instance and start the replica.

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3309 -e "CHANGE MASTER TO MASTER_HOST='percona_primary',MASTER_USER='repl_user', MASTER_PASSWORD='replpass', MASTER_LOG_FILE='deb40d6941db-bin.000003', MASTER_LOG_POS=154;"
Enter password:
MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3309 -e "start slave";
Enter password:
MacBook-Pro: $

Verify the Replication Status

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3309 -e "show slave status \G" |egrep 'Running|Master' | egrep -v 'SSL|TLS'
Enter password:
                  Master_Host: percona_primary
                  Master_User: repl_user
                  Master_Port: 3306
              Master_Log_File: deb40d6941db-bin.000003
          Read_Master_Log_Pos: 434
        Relay_Master_Log_File: deb40d6941db-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 434
        Seconds_Behind_Master: 0
             Master_Server_Id: 100
                  Master_UUID: 49451bba-2627-11ea-be09-0242ac120002
             Master_Info_File: /var/lib/mysql/master.info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
MacBook-Pro: $

Create a test DB to confirm it is replicating to the replica instance.

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3308 -e "create database mytestdb;"
Enter password:
MacBook-Pro: $

MacBook-Pro: $ mysql -h127.0.0.1 -uroot -p -P3309 -e "show databases like 'mytestdb';"
Enter password:
+---------------------+
| Database (mytestdb) |
+---------------------+
| mytestdb            |
+---------------------+
MacBook-Pro: $

To check the logs we can use the docker logs command, for example :

MacBook-Pro: $ docker logs percona_primary
Initializing database
2019-12-24T08:27:54.395857Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
...
....
2019-12-24T08:28:19.792475Z 0 [Note] mysqld: ready for connections.
Version: '5.7.26-29-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 29, Revision 11ad961
2019-12-24T08:45:04.071744Z 7 [Note] Start binlog_dump to master_thread_id(7) slave_server(101), pos(deb40d6941db-bin.000003, 154)
MacBook-Pro: $

The setup that I have shown above can be used as a quick setup of an async replication configuration to run demos or test experiments. Such a configuration is not suitable for production and will not be supported for production use by Docker.

Note: Percona Monitoring and Management (PMM) is also distributed as an appliance in the form of a docker image.

When you installed the Debezium MySQL connector, then it’ll start read your historical data and push all of them into the Kafka topics. This setting can we changed via snapshot.mode parameter in the connector. But if you are going to start a new sync, then Debezium will load the existing data its called Snapshot. Unfortunately, if you have a busy transactional MySQL database, then it may lead to some performance issues. And your DBA will never agree to read the data from Master Node.[Disclaimer: I’m a DBA :) ]. So I was thinking of figuring out to take the snapshot from the Read Replica, once the snapshot is done, then start read the realtime data from the Master. I found this useful information in a StackOverflow answer.

If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that’s done, switch to the master to read the binlog. But if you don’t use GTID, that’s a little more tricky. The tool would have to know the binlog position on the master corresponding to the snapshot on the replica.

Source: https://stackoverflow.com/a/58250791/6885516

Then I tried to implement in a realtime scenario and verified the statement is true. Yes, we made this in our system. Here is the step by step details from our PoC.

Requirements:

Master and Slave should be enabled with GTID.
Debezium Connector Node can talk to both master and slave.
log-slave-updates must be enabled on the slave(anyhow for GTID its requires).
A user account for Debezium with respective permissions.
Install Debezium connector.

Sample data:

Create a new database to test this sync and insert some values.

create database bhuvi;
use bhuvi;
create table rohi (
id int,
fn varchar(10),
ln varchar(10),
phone int);

insert into rohi values (1, 'rohit', 'last',87611);
insert into rohi values (2, 'rohit', 'last',87611);
insert into rohi values (3, 'rohit', 'last',87611);
insert into rohi values (4, 'rohit', 'last',87611);
insert into rohi values (5, 'rohit', 'last',87611);

Create the MySQL Connector Config:

File Name: mysql.json

{
    "name": "mysql-connector-db01",
    "config": {
        "name": "mysql-connector-db01",
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "database.server.id": "1",
        "tasks.max": "1",
        "database.history.kafka.bootstrap.servers": "YOUR-BOOTSTRAP-SERVER:9092",
        "database.history.kafka.topic": "schema-changes.mysql",
        "database.server.name": "mysql-db01",
        "database.hostname": "IP-OF-READER-NODE",
        "database.port": "3306",
        "database.user": "bhuvi",
        "database.password": "****",
        "database.whitelist": "bhuvi",
        "snapshot.mode": "initial",
        "snapshot.locking.mode": "none",
        "key.converter": "org.apache.kafka.connect.json.JsonConverter",
        "value.converter": "org.apache.kafka.connect.json.JsonConverter",
        "key.converter.schemas.enable": "false",
        "value.converter.schemas.enable": "false",
        "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
        "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
        "internal.key.converter.schemas.enable": "false",
        "internal.value.converter.schemas.enable": "false",
        "transforms": "unwrap",
       	"transforms.unwrap.add.source.fields": "ts_ms",
  		"tombstones.on.delete": "false",
  		"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState
    }
}

Watch the status of the connector:

Open three terminal windows and start listening to the following topics.

NOTE: change the bootstrap-server as per your cluster’s IP.

connect-configs
connect-status

--Terminal-1 kafka-console-consumer –bootstrap-server localhost:9092 –topic connect-configs –from-beginning

--Terminal-2 kafka-console-consumer –bootstrap-server localhost:9092 –topic connect-status –from-beginning

Install the Connector:

curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @mysql.json

Once you installed, from your connect-configs topic, you will get the following output.

{"properties":{"connector.class":"io.debezium.connector.mysql.MySqlConnector","snapshot.locking.mode":"none","database.user":"bhuvi","database.server.id":"1","tasks.max":"1","database.history.kafka.bootstrap.servers":"172.31.40.132:9092","database.history.kafka.topic":"schema-changes.mysql","database.server.name":"mysql-db01","internal.key.converter.schemas.enable":"false","database.port":"3306","key.converter.schemas.enable":"false","internal.key.converter":"org.apache.kafka.connect.json.JsonConverter","task.class":"io.debezium.connector.mysql.MySqlConnectorTask","database.hostname":"172.31.25.99","database.password":"*****","internal.value.converter.schemas.enable":"false","name":"mysql-connector-db01","value.converter.schemas.enable":"false","internal.value.converter":"org.apache.kafka.connect.json.JsonConverter","value.converter":"org.apache.kafka.connect.json.JsonConverter","database.whitelist":"bhuvi","key.converter":"org.apache.kafka.connect.json.JsonConverter","snapshot.mode":"initial"}}
{"tasks":1}

And then from your connect-statustopic, you’ll get the status of your MySQL connector.

{"state":"RUNNING","trace":null,"worker_id":"172.31.36.115:8083","generation":2}
{"state":"RUNNING","trace":null,"worker_id":"172.31.36.115:8083","generation":3}

Snapshot Status from the log file:

By default, the Kafka connector’s logs will go to syslog. You can customize this log location. So wherever you have the log file, you can see the snapshot progress there.

[2019-12-28 11:06:04,246] INFO Step 7: scanning contents of 1 tables while still in transaction (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,252] INFO Step 7: - scanning table 'bhuvi.rohi' (1 of 1 tables) (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,252] INFO For table 'bhuvi.rohi' using select statement: 'SELECT * FROM `bhuvi`.`rohi`' (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,264] INFO Step 7: - Completed scanning a total of 31 rows from table 'bhuvi.rohi' after 00:00:00.012 (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,265] INFO Step 7: scanned 5 rows in 1 tables in 00:00:00.018 (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,265] INFO Step 8: committing transaction (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,267] INFO Completed snapshot in 00:00:01.896 (io.debezium.connector.mysql.SnapshotReader)
[2019-12-28 11:06:04,348] WARN [Producer clientId=connector-producer-mysql-connector-db01-0] Error while fetching metadata with correlation id 7 : {mysql-db01.bhuvi.rohi=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2019-12-28 11:06:04,460] INFO Transitioning from the snapshot reader to the binlog reader (io.debezium.connector.mysql.ChainedReader)
[2019-12-28 11:06:04,492] INFO GTID set purged on server: 88726004-2734-11ea-ae86-0e7687279b85:1-7 (io.debezium.connector.mysql.BinlogReader)
[2019-12-28 11:06:04,492] INFO Attempting to generate a filtered GTID set (io.debezium.connector.mysql.MySqlTaskContext)
[2019-12-28 11:06:04,492] INFO GTID set from previous recorded offset: 88726004-2734-11ea-ae86-0e7687279b85:1-11 (io.debezium.connector.mysql.MySqlTaskContext)
[2019-12-28 11:06:04,492] INFO GTID set available on server: 88726004-2734-11ea-ae86-0e7687279b85:1-11 (io.debezium.connector.mysql.MySqlTaskContext)
[2019-12-28 11:06:04,492] INFO Final merged GTID set to use when connecting to MySQL: 88726004-2734-11ea-ae86-0e7687279b85:1-11 (io.debezium.connector.mysql.MySqlTaskContext)
[2019-12-28 11:06:04,492] INFO Registering binlog reader with GTID set: 88726004-2734-11ea-ae86-0e7687279b85:1-11 (io.debezium.connector.mysql.BinlogReader)

Snapshot Complete:

Once your’ snapshot process is done, then the connect-offsets topic will have the binlog information of till where it’s consumed.

{"file":"ip-172-31-25-99-bin.000001","pos":1234,"gtids":"88726004-2734-11ea-ae86-0e7687279b85:1-11"}

Then it’ll start applying the ongoing replication changes as well.

{"ts_sec":1577531225,"file":"ip-172-31-25-99-bin.000001","pos":1299,"gtids":"88726004-2734-11ea-ae86-0e7687279b85:1-11","row":1,"server_id":1,"event":2}

Now we have verified that the database’s snapshot has been done. Its time to swap the nodes. We’ll start consuming from the Master.

If you enable the Monitoring for the Debezium connector, then you see the lag from the JMX or Premetheus metrics.

Reference: Configuring monitoring for Debezium MySQL Connector.

curl localhost:7071 | grep debezium_metrics_SecondsBehindMaster
debezium_metrics_SecondsBehindMaster{context="binlog",name="mysql-db01",plugin="mysql",} 299.577536699E9

Sometimes the metrics take a few more minutes to update. So once you are able to see the last binlog information from the connet-offsets and the lag <10, then the snapshot is done.

Switch to Master:

The main important thing is to STOP the slave thread in your Read replica. This will prevent the changing the GTID in your connect-offsets topic.

mysql-slave> STOP SLAVE;

To simulate the sync, we can add 1 new row in our MySQL table. So this will never replicate to your slave. But once you switch the node, it should start reading from this row.

mysql-master> insert into rohi values (6, 'rohit', 'last','87611');

We need to update the existing MySQL connector’s config and just change the "database.hostname" parameter.

Note: this JSON file format is different from the one which we used to register the connector. So make sure the syntax.

{
	"connector.class": "io.debezium.connector.mysql.MySqlConnector",
	"snapshot.locking.mode": "none",
	"tasks.max": "3",
	"database.history.kafka.topic": "schema-changes.mysql",
	"transforms": "unwrap",
	"internal.key.converter.schemas.enable": "false",
	"transforms.unwrap.add.source.fields": "ts_ms",
	"tombstones.on.delete": "false",
	"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
	"value.converter": "org.apache.kafka.connect.json.JsonConverter",
	"database.whitelist": "bhuvi",
	"key.converter": "org.apache.kafka.connect.json.JsonConverter",
	"database.user": "bhuvi",
	"database.server.id": "1",
	"database.history.kafka.bootstrap.servers": "YOUR-KAFKA-BOOTSTRAP-SERVER:9092",
	"database.server.name": "mysql-db01",
	"database.port": "3306",
	"key.converter.schemas.enable": "false",
	"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
	"database.hostname": "MASTER-IP-ADDRESS",
	"database.password": "****",
	"internal.value.converter.schemas.enable": "false",
	"name": "mysql-connector-db01",
	"value.converter.schemas.enable": "false",
	"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
	"snapshot.mode": "initial"
}

Run the below command to update the config file.

File Name: mysql-update.json

curl -X PUT -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors/mysql-connector-db01/config -d @mysql-update.json

Once its updated, from the connect-offsets topic, you can see that the Debezium starts reading the data from the Next GTID.

{"ts_sec":1577531276,"file":"mysql-bin.000008","pos":1937,"gtids":"88726004-2734-11ea-ae86-0e7687279b85:1-13","row":1,"server_id":1,"event":2}

Also from your topic, you can see the last row has been pushed.

kafka-console-consumer --bootstrap-server localhost:9092 --topic mysql-db01.bhuvi.rohi --from-beginning

{"id":1,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":2,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":3,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":4,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":5,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":6,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577531276000}

This method helped us to sync the historical data from the Read replica to the Kafka topic without affecting the transactions on the Master node. Still, we are exploring this for more scenarios. I’ll keep posting new articles about this.

From my last blog , I have explained the details about the configuration of InnoDB Cluster ( Group Replication + MySQL shell ) . You can find the link below .

MySQL InnoDB Cluster Tutorial 1 ( Group Replication + MySQL Shell )

In this blog, I am going to explain How to integrate the MySQL router with the existing cluster setup .

As I explained in Tutorial 1 , I already have configured the cluster setup with MySQL shell and group replication,

MySQL 192.168.33.11:3306 ssl JS > cluster.getName();
first_InnoDB_cluster

MySQL 192.168.33.11:3306 ssl JS > \sql
Switching to SQL mode… Commands end with ;

MySQL 192.168.33.11:3306 ssl SQL > select channel_name,member_host,member_state,member_role,member_version from performance_schema.replication_group_members\G
* 1. row *
channel_name: group_replication_applier
member_host: sakthilabs11
member_state: ONLINE
member_role: PRIMARY
member_version: 8.0.18
* 2. row *
channel_name: group_replication_applier
member_host: sakthilabs12
member_state: ONLINE
member_role: SECONDARY
member_version: 8.0.18
* 3. row *
channel_name: group_replication_applier
member_host: sakthilabs13
member_state: ONLINE
member_role: SECONDARY
member_version: 8.0.18
3 rows in set (0.0070 sec)

Let’s jump into the topic. First step, need to install the MySQL router ,

yum install mysql-router-community.x86_64

I have installed MySQL router community edition .

# yum list installed | grep -i router
mysql-router-community.x86_64 8.0.18-1.el7 @mysql-tools-community

Second step, need to create the dedicated directory for the MySQL router operation . After this , need to run the MySQL router with bootstrap option .

# mkdir -p /root/mysqlrouter

# mysqlrouter –bootstrap InnoDBCluster@sakthilabs11:3306 –directory /root/mysqlrouter –user=root
Please enter MySQL password for InnoDBCluster:

Bootstrapping MySQL Router instance at ‘/root/mysqlrouter’…

…….

MySQL Classic protocol

Read/Write Connections: localhost:6446

Read/Only Connections: localhost:6447

MySQL X protocol

Read/Write Connections: localhost:64460

Read/Only Connections: localhost:64470

–bootstrap : bootstrap option will helps to automatically configure the router operation with the MySQL InnoDB cluster

The below files will be created after bootstrap the router .

pwd : /root/mysqlrouterdrwx——. 2 root root 6 Dec 30 15:07 run
-rw——-. 1 root root 88 Dec 30 15:07 mysqlrouter.key
drwx——. 2 root root 29 Dec 30 15:07 log
-rwx——. 1 root root 277 Dec 30 15:07 start.sh
-rw——-. 1 root root 1.4K Dec 30 15:07 mysqlrouter.conf
drwx——. 2 root root 39 Dec 30 15:07 data
-rwx——. 1 root root 161 Dec 30 15:07 stop.sh

mysqlrouter.conf will contains the Configuration options , By triggering the start.sh script we can start the MySQL router daemon .

# ./start.sh
PID 14791 written to ‘/root/mysqlrouter/mysqlrouter.pid’
logging facility initialized, switching logging to loggers specified in configuration

# ps -ef | grep -i mysqlrou
root 14791 1 21 15:22 pts/0 00:00:04 /bin/mysqlrouter -c /root/mysqlrouter/mysqlrouter.conf
root 14801 14636 0 15:23 pts/0 00:00:00 grep –color=auto -i mysqlrou

# netstat -tulnp | grep -i mysqlrouter
tcp 0 0 0.0.0.0:64460 0.0.0.0:* LISTEN 14791/mysqlrouter
tcp 0 0 0.0.0.0:6446 0.0.0.0:* LISTEN 14791/mysqlrouter
tcp 0 0 0.0.0.0:6447 0.0.0.0:* LISTEN 14791/mysqlrouter
tcp 0 0 0.0.0.0:64470 0.0.0.0:* LISTEN 14791/mysqlrouter

Alright, we have integrated the MySQL router with Cluster . Now we can test this with read and read/write connections .

For read / write connections , ( port : 6446 )

# mysql -P6446 -uInnotest -p’xxxxxxxxxx’ -h127.0.0.1 -e “create database test_write”

# mysql -P6446 -uInnotest -p’xxxxxxxxxxx’ -h127.0.0.1 -e “use sakthi ; select database()”
mysql: [Warning] Using a password on the command line interface can be insecure.
+————+
| database() |
+————+
| sakthi |
+————+

I can perform, both read and writes with pot 6446 .

For only read connections , ( port : 6447 )

# mysql -P6447 -uInnotest -p’xxxxxxxx’ -h127.0.0.1 -e “use sakthi ; select database()”
+————+
| database() |
+————+
| sakthi |
+————+

# mysql -P6447 -uInnotest -p’xxxxxxxxxx’ -h127.0.0.1 -e “create database test_write”
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1290 (HY000) at line 1: The MySQL server is running with the –super-read-only option so it cannot execute this statement

We can perform only reads with port 6447 . It illustrate the port 6447 only connects the reader nodes not master .

I hope this blog will helps someone who has started to learn the MySQL InnoDB cluster .

Thanks !!

Many things in the MySQL 8.0 Series has evolved. I’ve discussed many of those things in prior blogs such as items like the MySQL-Router w/InnoDB Cluster, plus a Series on Enterprise Backup with InnoDB Cluster Backup & Restore Use Cases. But now its time to update everyone on the evolution of the Yum Repo packaging… Read More »

In my previous post, I have shown you how to take the snapshot from Read Replica with GTID for Debezium MySQL connector. GTID concept is awesome, but still many of us using the replication without GTID. For these cases, we can take a snapshot from Read replica and then manually push the Master binlog information to the offsets topic. Injecting manual entry for offsets topic is already documented in Debezium. I’m just guiding you the way to take snapshot from Read replica without GTID.

Requirements:

Setup master slave replication.
The slave must have log-slave-updates=ON else connector will fail to read from beginning onwards.
Debezium connector should be able to access the Read replica with a user that is having necessary permissions.
Install Debezium connector.

Use a different name for Slave binlog:

Note: If you are already having a Master slave setup then ignore this step.

By default, MySQL use mysql-bin as a prefix for all the mysql binlog files. We should not have the same binlog name for both the master and the slave. If you are setting up a new master-slave replication then make this change in my.cnf file.

master#
log_bin = /var/log/mysql/mysql-bin.log
slave#
log_bin = /var/log/mysql/mysql-slave-bin.log

Sample data:

Create a new database to test this sync and insert some values.

create database bhuvi;
use bhuvi;
create table rohi (
id int,
fn varchar(10),
ln varchar(10),
phone int);

insert into rohi values (1, 'rohit', 'last',87611);
insert into rohi values (2, 'rohit', 'last',87611);
insert into rohi values (3, 'rohit', 'last',87611);
insert into rohi values (4, 'rohit', 'last',87611);
insert into rohi values (5, 'rohit', 'last',87611);

Create the MySQL Connector Config:

File Name: mysql.json

{
"name": "mysql-connector-db01",
"config": {
"name": "mysql-connector-db01",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.server.id": "1",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "YOUR-BOOTSTRAP-SERVER:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "mysql-db01",
"database.hostname": "IP-OF-READER-NODE",
"database.port": "3306",
"database.user": "bhuvi",
"database.password": "****",
"database.whitelist": "bhuvi",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.key.converter.schemas.enable": "false",
"internal.value.converter.schemas.enable": "false",
"transforms": "unwrap",
"transforms.unwrap.add.source.fields": "ts_ms",
"tombstones.on.delete": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState
}
}

Once the snapshot has been done, then it’ll push the binlog information of the Slave while taking the snapshot. And then it’ll start to continue to do CDC for the upcoming data. You will see the first record in your connect-offsets topic as like below.

{"file":"ip-172-31-25-99-bin.000002","pos":7240}

Then for continuous replication, it’ll start adding the record to this topic along with some more addition metadata like, server id, timestamp.

{"ts_sec":1577764293,"file":"ip-172-31-25-99-bin.000002","pos":7305,"row":1,"server_id":1,"event":2}

You can monitor the snapshot progress from JMX.

curl localhost:7071 | grep debezium_metrics_SecondsBehindMaster
debezium_metrics_SecondsBehindMaster{context="binlog",name="mysql-db01",plugin="mysql",} 299.577536699E9

Sometimes the metrics take a few more minutes to update. So once you are able to see the last binlog information from the connet-offsets and from JMX the lag <10, then the snapshot is done.

Switch to Master:

Before switching to the master, we need to stop the slave instance to get the consistent binlog information of Master from the Read replica. And then stop the Debezium connector to update binlog information manually in the connect-offsets topic.

mysql-slave> stop slave;

Debezium-connector-node# systemctl stop confluent-connect-distributed

To simulate the real-time scenario, we can add 1 new row in our MySQL table. So this will never replicate to your slave. But once you switch the node, it should start reading from this row.

mysql-master> insert into rohi values (6, 'rohit', 'last','87611');

Also create a new table and insert one new row to this new table.

mysql-master> create table testtbl (id int);
mysql-master> insert into testtbl values (1);

Once the switchover has been done, then it should read the 6'th row that we inserted and a new topic should be created for the testtbl

Get the last binlog info from offsets:

Install kafkacat in you broker node. (it’s available from confluent repo)

apt-get install kafkacat

Run the below command get the last read binlog info.

kafkacat -b localhost:9092 -C -t connect-offsets  -f 'Partition(%p) %k %s\\n'

-b - Broker
-C consumer
-t Topic
-f lag takes arguments specifying both the format of the output and the fields to include.

You will get something like this.

Partition(0) \["mysql-connector-db01",{"server":"mysql-db01"}\] {"file":"ip-172-31-25-99-bin.000002","pos":7240}
Partition(0) \["mysql-connector-db01",{"server":"mysql-db01"}\] {"ts_sec":1577764293,"file":"ip-172-31-25-99-bin.000002","pos":7305,"row":1,"server_id":1,"event":2}

Partition(0) - The Partition where the information is location.
mysql-connector-db01 Connector Name
"server":"mysql-db01" Server name that the connect has.
"ts-sec":1577764293,"file":"ip-172-31-25-99-bin.000002","pos":7305,"row":1,"server_id":1,"event":2 - Binlog information

Now we’ll manually push a new record inside this topic with the same information but just replace the binlog file name and its position. We need to continue the CDC where it stopped, so the get the exact starting binlog information we’ll use slave status from the Read replica.

mysql-slave> show slave status\\G

                   Slave_IO_State:
                      Master_Host: 172.31.36.115
                      Master_User: bhuvi
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000003
              Read_Master_Log_Pos: 7759
                   Relay_Log_File: ip-172-31-25-99-relay-bin.000009
                    Relay_Log_Pos: 7646
              Exec_Master_Log_Pos: 7759

Make a note of Master-log-file and Exec-Master-Log-Pos from the slave status. Now inject a new record to the offets topic.

echo '\["mysql-connector-db01",{"server":"mysql-db01"}\]|{"file":"mysql-bin.000003","pos":7759}' |   
kafkacat -P -b localhost:9092 -t connect-offsets -K | -p 0

-b Broker
-P Producer
-K Delimiter
-p Partition

If you read the data from this topic, you’ll see the manually injected record.

kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-offsets --from-beginning

{"file":"ip-172-31-25-99-bin.000002","pos":7240}
{"ts_sec":1577764293,"file":"ip-172-31-25-99-bin.000002","pos":7305,"row":1,"server_id":1,"event":2}
{"file":"mysql-bin.000003","pos":7759}

Once you start the Debezium MySQL connector, then it’ll start reading from the slave but it’ll start looking for the binlog file mysql-bin.000003 If you use the same binlog file name for both master and slave, then it’ll be a problem. So we can do any one of the following method to solve this.

Use different naming conversion for both Master and Slave binlog files.
Delete all the binlog files from the Slave using Reset master command.
If the binlog file in slave is having a file named as mysql-bin.000003 then delete this file alone.
If the binlog file in slave is having a file names as mysql-bin.000003 then rename this file as mysql-bin.000003.old

Disclaimer: Please consider with your DBA before performing any of the above steps. I recommend using step 1 or 4.

Start the debezium connector:

Debezium-connector-node#  systemctl start confluent-connect-distributedv

You in your connector log file, you can see there is an error indicating that the Debezium is not able to find the binlog file called mysql-bin.000003.

\[2019-12-31 03:55:17,128\] INFO WorkerSourceTask{id=mysql-connector-db01-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
\[2019-12-31 03:55:17,131\] ERROR WorkerSourceTask{id=mysql-connector-db01-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at binlog file 'mysql-bin.000003', pos=7759, skipping 2 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
at io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:132)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:208)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
\[2019-12-31 03:55:17,132\] ERROR WorkerSourceTask{id=mysql-connector-db01-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
\[2019-12-31 03:55:17,132\] INFO Stopping MySQL connector task (io.debezium.connector.mysql.MySqlConnectorTask)

Now we need to update the existing MySQL connector’s config and just change the "database.hostname" parameter.

Note: this JSON file format is different from the one which we used to register the connector. So make sure the syntax.

File Name: mysql-update.json

{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "none",
"tasks.max": "3",
"database.history.kafka.topic": "schema-changes.mysql",
"transforms": "unwrap",
"internal.key.converter.schemas.enable": "false",
"transforms.unwrap.add.source.fields": "ts_ms",
"tombstones.on.delete": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.whitelist": "bhuvi",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.user": "bhuvi",
"database.server.id": "1",
"database.history.kafka.bootstrap.servers": "YOUR-KAFKA-BOOTSTRAP-SERVER:9092",
"database.server.name": "mysql-db01",
"database.port": "3306",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "MASTER-IP-ADDRESS",
"database.password": "****",
"internal.value.converter.schemas.enable": "false",
"name": "mysql-connector-db01",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}

Run the below command to update the config file.

curl -X PUT -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors/mysql-connector-db01/config -d @mysql-update.json

Once the update is done, immediately it’ll start connecting to the master and start reading the binlog file mysql-bin.000003 from position 7759.

We inserted a new record to the rohi table. If you read this topic then you can see the row has been read. Also start inserting few more rows to this table with id 7 and 8.

kafka-console-consumer --bootstrap-server localhost:9092 --topic mysql-db01.bhuvi.rohi --from-beginning

{"id":6,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577788740000}
{"id":7,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577788764000}
{"id":8,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577788767000}

Also, it should added the testtbl to the kafka topic.

kafka-topics --zookeeper localhost:2181 --list

connect-configs
connect-offsets
connect-status
default_ksql_processing_log
my_connect_offsets
mysql-db01
mysql-db01.bhuvi.rohi
mysql-db01.bhuvi.testtbl
schema-changes.mysql

Once your switchover is done, resume the replication on your slave.

The past decade in database storage was interesting whether you stayed with local attach storage, used block & object storage from cloud or on-prem vendors or moved to OSS scale-out storage like Ceph, GlusterFS and MinIO. I am writing about my experience and will focus on local attach.

Over the past decade the DBMS deployments I cared for went from disk to flashcache to flash on the HW side and then from MySQL+InnoDB to MySQL+MyRocks on the SW side. I assume that HW changes faster than DBMS software. DBMS algorithms that can adapt to such changes will do better in the next decade.

One comment I have heard a few too many times is that storage performance doesn't matter much because you can fit the database in RAM. More recently I hear the same except change RAM to Optane. I agree that this can be done for many workloads. I am less certain that it should be done for many workloads. That (all data in RAM/Optane) costs a lot in money, power and even space in the data center. Lets make web-scale DBMS green. Use enough RAM/Optane for cache to meet the performance SLA and then use SSD or disk arrays. At some point there is no return from cutting the DBMS query response time in half but cutting the DBMS HW cost in half is usually a big deal.

Priorities

With disk and flashcache I worried a lot about the IOPs demand because the supply was limited, usually less than 2000 operations/second. On moving to flash I stopped worrying about that and began worrying about write and space amplification (efficiency).

The context for this is small data (OLTP) workloads and deployments where reducing HW cost matters. Overcoming the IOPs shortage was interesting at times and good for my career as there were always new problems that had to be fixed right now. Moving to flash made life easier for everyone. There was an opportunity cost from using disks -- time spent solving the latest IOPs demand crisis was time not spent on longer term projects. Moving to flash gave us time to build and deploy MyRocks.

MyRocks has better space and write efficiency than a b-tree. The cost of better space and write efficiency with an LSM is more CPU overhead for point and range queries. Sometimes that is a great trade. Better space and write efficiency means you buy less SSD and it lasts longer. Better write efficiency is a big deal with lower endurance (TLC and QLC) NAND flash. I wonder how this changes in the cloud. Cloud vendors might improve their profit margins with better space and write efficiency but they also have the ability to pass on some of the inefficiency costs to the user. A cloud user doesn't have to worry as much about write efficiency because they are renting the SSD.

Hardware

This is my history with storage for web-scale MySQL. The NUC servers I use today have similar RAM/CPU as the DBMS servers I started with in 2005 but the NUC servers have much more IO capacity.

First there were disk arrays with HW RAID and SW RAID. This was RAID 10 which was better for durability than availability. Data isn't lost on a single-disk failure but the server performance is unacceptable when a HW RAID cache battery fails (fsync is too slow) or a rebuild is in progress after a disk gets replaced.

Then there was flashcache and performance is wonderful when the read working sit fits in the flash cache but there is an abrupt change in performance when it does not. Those were exciting years. Some of the performance critical parts of flashcache were in the Linux kernel. I lack kernel skills and it took us (really, Domas) a while to identify perf problems that were eventually fixed.

Then there was flash and the abundance of IOPs was wonderful. I look forward to the next decade.

Anecdotes

If you use disk arrays at scale then you will see corruption at scale. You are likely using multiple storage devices with multiple firmware revisions. It is interesting when 99% of corruption occurs on 1% of the deployment -- all on the same, new firmware revision. That result makes it easy to focus on the HW as the probable problem and stop blaming MySQL. I can't imagine doing web-scale DBMS without per-page checksums.

Performance variance with NAND flash is to be expected. I hope that more is done to explain and document it for OLTP workloads. The common problem is that NAND flash GC can stall reads on the device. I wish it were easier to characterize device performance for enthusiasts like myself. I am sure there is an interesting paper on this topic. How much concurrency does the device provide? How are writes done? How is GC done? What is the stall distribution? What can be done to reduce stalls (see multistream and LightNVM)?

Using TRIM (mount FS with discard) at scale is exciting. RocksDB and MyRocks do a lot of TRIM while while InnoDB does not. How many GB/s and unlink/s of TRIM does the device support? TRIM performance varies greatly by vendor. I hope more is done to document these differences. Perhaps we need trimbench. People at web-scale companies have stories that never get shared because they don't want to throw their SSD vendors under the bus. I was spoiled by FusionIO. My memory is that FusionIO TRIM was a noop from a perf perspective.

Innosim is an InnoDB IO simulator that I wrote to help device vendors reproduce performance stalls we encountered with web-scale MySQL. It is easier to run than MySQL while able to generate similar IO patterns. I wrote it because InnoDB has a pattern of coordinated IO that fio wasn't able to reproduce. The pattern occurs during page write back -- first write the double write buffer (1MB or 2MB sequential write) and then do 64 or 128 random 16kb writes. Innosim also takes much less time to reach steady state -- just sequentially write out X GB of database pages versus load InnoDB and then run (for days) an update workload to fragment the indexes. Fragmentation takes time. I wish more DBMS benchmarks ran long enough to get sufficient fragmentation but that can be expensive.

Perhaps one day I will write WTsim, the WiredTiger IO simulator. I wrote ldbsim, the LevelDB IO simulator, but it was rarely used because the RocksDB benchmark client, db_bench, was easy to use even if fragmenting the LSM tree still took a long time. I am not sure that fio would be able to reproduce the special IO patterns created by RocksDB compaction. I love fio but I am not sure it should try to solve this problem for me.

I have published enough Debezium MySQL connector tutorials for taking snapshots from Read Replica. To continue my research I wanted to do something for AWS RDS Aurora as well. But aurora is not using binlog bases replication. So we can’t use the list of tutorials that I published already. In Aurora, we can get the binlog file name and its position from its snapshot of the source Cluster. So I used a snapshot for loading the historical data, and once it’s loaded we can resume the CDC from the main cluster.

Requirements:

Running aurora cluster.
Aurora cluster must have binlogs enabled.
Make binlog retention period to a minimum 3 days(its a best practice).
Debezium connector should be able to access both the clusters.
Make sure you have different security groups for the main RDS Aurora cluster and the Snapshot cluster.

Sample data in source aurora:

create database bhuvi;
use bhuvi;

create table rohi (
id int,
fn varchar(10),
ln varchar(10),
phone int);

insert into rohi values (1, 'rohit', 'last',87611);
insert into rohi values (2, 'rohit', 'last',87611);
insert into rohi values (3, 'rohit', 'last',87611);
insert into rohi values (4, 'rohit', 'last',87611);
insert into rohi values (5, 'rohit', 'last',87611);

Take Aurora snapshot:

Go to the RDS console and select your source Aurora master node. Take a snapshot of it. Once the snapshot has been done, you see that in the snapshots tab.

New cluster from snapshot:

Then create a new cluster from the snapshot. Once its launched, we can get the binlog info from the logs.

In RDS Console, select the instance name. Click on the Logs & Events tab. Below the Recent events, you can see the binlog information of the source Aurora node while talking the snapshot. This cluster also needs to enable with binlog.

Register the MySQL Connector:

Follow this link to configure Kafka cluster and connector. Create a file called mysql.json and add the Snapshot cluster’s information.

{
"name": "mysql-connector-db01",
"config": {
"name": "mysql-connector-db01",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.server.id": "1",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "YOUR-BOOTSTRAP-SERVER:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "mysql-db01",
"database.hostname": "SNAPSHOT-INSTANCE-ENDPOINT",
"database.port": "3306",
"database.user": "bhuvi",
"database.password": "****",
"database.whitelist": "bhuvi",
"snapshot.mode": "initial",
"snapshot.locking.mode": "none",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.key.converter.schemas.enable": "false",
"internal.value.converter.schemas.enable": "false",
"transforms": "unwrap",
"transforms.unwrap.add.source.fields": "ts_ms",
"tombstones.on.delete": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
}
}

Run the below command to register it on the connector node.

curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @mysql.json

Once the snapshot has been done, you can see the snapshot cluster’s current binlog file name and its position in the connect-offsets topic.

kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-offsets --from-beginning

{"file":"mysql-bin-changelog.000006","pos":154}

Add more data on the source Cluster:

To simulate the real production setup, add few more rows to the rohi table.

insert into rohi values (6, 'rohit', 'last',87611);
insert into rohi values (7, 'rohit', 'last',87611);

Also, create a new table.

use bhuvi;
create table testtbl (id int);
insert into testtbl values (1);

Because, once we switch to the source cluster, it should read these new data.

Update the Source Aurora binlog info:

Stop the connector service and manually inject the binlog information that we got from the Snapshot cluster’s Log & Events section.

connector-node# systemctl stop confluent-connect-distributed

Get the last read binlog information and its parition from the connect-offsets topic.

kafkacat -b localhost:9092 -C -t connect-offsets  -f 'Partition(%p) %k %s\n'

Partition(0) ["mysql-connector-db01",{"server":"mysql-db01"}\] {"file":"mysql-bin-changelog.000006","pos":154}

kafkacat - command-line utility from confluent.
-b localhost:9092 - broker details
-C - Consumer
-t connect-offsets - topic
Partition(0) - The partition name where we have the binlog info.
mysql-connector-db01 - connector name
"server":"mysql-db01 - server name we used in mysql.json file

Run the following command to inject the binlog info to the connect-offsets topic.

echo '["mysql-connector-db01",{"server":"mysql-db01"}]|{"file":"mysql-bin-changelog.000002","pos":2170}' | \  
kafkacat -P -b localhost:9092 -t connect-offsets -K | -p 0

mysql-connector-db01 - connector name
"server":"mysql-db01 - server name we used in mysql.json file
{"file":"mysql-bin-changelog.000002","pos":2170} - Binlog info from the snapshot cluster’s log.
kafkacat - command-line utility from confluent.
-P - Producer
-b localhost:9092 - broker details
-t connect-offsets - topic
-p 0 Partition where we have the binlog info.

Now if you read the data from the consumer, it’ll show the new binlog.

kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-offsets --from-beginning

{"file":"mysql-bin-changelog.000006","pos":154}
{"file":"mysql-bin-changelog.000002","pos":2170}

Switch to Source Cluster:

Before doing the switchover, we need to make that the connector should not access to the snapshot cluster once the connector service started. We can achieve this in 2 ways.

Anyhow, we read all the from the snapshot cluster, so delete it.
In the Snapshot cluster’s security group, remove the connector’s node IP.

I recommend using the 2nd option. Now start the connector service. After a few seconds, you can see the logs like below.

\[2020-01-02 06:57:21,448\] INFO Starting MySqlConnectorTask with configuration: (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,450\] INFO    connector.class = io.debezium.connector.mysql.MySqlConnector (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,450\] INFO    snapshot.locking.mode = none (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,451\] INFO    tasks.max = 1 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,451\] INFO    database.history.kafka.topic = replica-schema-changes.mysql (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,452\] INFO    transforms = unwrap (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,452\] INFO    internal.key.converter.schemas.enable = false (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,452\] INFO    transforms.unwrap.add.source.fields = ts_ms (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    tombstones.on.delete = false (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    transforms.unwrap.type = io.debezium.transforms.ExtractNewRecordState (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    value.converter = org.apache.kafka.connect.json.JsonConverter (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.whitelist = bhuvi (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    key.converter = org.apache.kafka.connect.json.JsonConverter (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.user = admin (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.server.id = 1 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.history.kafka.bootstrap.servers = 172.31.40.132:9092 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.server.name = mysql-db01 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,453\] INFO    database.port = 3306 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    key.converter.schemas.enable = false (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    internal.key.converter = org.apache.kafka.connect.json.JsonConverter (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    task.class = io.debezium.connector.mysql.MySqlConnectorTask (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    database.hostname = snapshot-cluster.cluster-chbcar19iy5o.us-east-1.rds.amazonaws.com (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    database.password = ******** (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    internal.value.converter.schemas.enable = false (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    name = mysql-connector-db01 (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    value.converter.schemas.enable = false (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    internal.value.converter = org.apache.kafka.connect.json.JsonConverter (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,454\] INFO    snapshot.mode = initial (io.debezium.connector.common.BaseSourceTask)
\[2020-01-02 06:57:21,512\] INFO \[Producer clientId=connector-producer-mysql-connector-db01-0\] Cluster ID: H-jsdNk9SUuud35n3AIk8g (org.apache.kafka.clients.Metadata)

Update the Endpoint:

Create an updated config file which has the endpoint of Source Aurora endpoint and the snapshot mode = schema only recovery .

And the main important thing is use a different topic for schema changes history. Else you’ll end up with some error like below.

ERROR Failed due to error: Error processing binlog event (io.debezium.connector.mysql.BinlogReader)
org.apache.kafka.connect.errors.ConnectException: Encountered change event for table bhuvi.rohi whose schema isn't known to this connector

File: mysql-update.json

{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "none",
"tasks.max": "3",
"database.history.kafka.topic": "schema-changes.mysql",
"transforms": "unwrap",
"internal.key.converter.schemas.enable": "false",
"transforms.unwrap.add.source.fields": "ts_ms",
"tombstones.on.delete": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.whitelist": "bhuvi",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.user": "admin",
"database.server.id": "1",
"database.history.kafka.bootstrap.servers": "BROKER-NODE-IP:9092",
"database.server.name": "mysql-db01",
"database.port": "3306",
"key.converter.schemas.enable": "false",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "SOURCE-AURORA-ENDPOINT",
"database.password": "*****",
"internal.value.converter.schemas.enable": "false",
"name": "mysql-connector-db01",
"value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "SCHEMA_ONLY_RECOVERY"
}

Run the below command to update the MySQL connector.

curl -X PUT -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors/mysql-connector-db01/config -d @mysql-update.json

Then immediately it’ll start reading from the Source Aurora cluster from the binlog position mysql-bin-changelog.000002 2170

You can see these changes from the connect-offsets topic.

kafka-console-consumer --bootstrap-server localhost:9092 --topic connect-offsets --from-beginning

{"file":"mysql-bin-changelog.000006","pos":154}
{"file":"mysql-bin-changelog.000002","pos":2170}
{"ts_sec":1577948351,"file":"mysql-bin-changelog.000003","pos":1207,"row":1,"server_id":2115919109,"event":2}

And we add 2 more rows to the rohi table. You can see those new values from the bhuvi.rohi topic.

kafka-console-consumer --bootstrap-server localhost:9092 --topic mysql-db01.bhuvi.rohi --from-beginning
{"id":1,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":2,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":3,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":4,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}
{"id":5,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":0}

{"id":6,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577948298000}
{"id":7,"fn":"rohit","ln":"last","phone":87611,"__ts_ms":1577948304000}

Also, you can the new table testtbl added to the topic.

kafka-topics --zookeeper localhost:2181 --list

connect-configs
connect-offsets
connect-status
default_ksql_processing_log
mysql-db01
mysql-db01.bhuvi.rohi
mysql-db01.bhuvi.testtbl
replica-schema-changes.mysql
schema-changes.mysql