Data security plays a major role in current age. Privacy matters a lot. Data masking is one of the Key Features when comes to security.
- In MySQL community versions if you want to mask your data, You can go with a Maxscale load balancer.
- They introduced a new masking filter on the Maxscale 2.1 version.
- My colleague Prasanth written about column-level data masking. For reference click here Column Level Data Masking.
- In MySQL 8.0 Enterprise they have introduced a new security feature called data masking.
- This feature was introduced in MySQL 8.0.13 Enterprise Edition and it provides data masking and de-identification capabilities.
What is Masking ?
- Transformation of existing data to mask it and remove identifying characteristics, for example, changing all digits of a credit card number but the last four to ‘X’ characters.
- Generation of random data, such as email addresses and payment card numbers.
In MySQL 8.0 they have introduced 14 functions,
1) gen_blacklist 2) gen_dictionary 3) gen_dictionary_drop 4) gen_dictionary_load 5) gen_range 6) gen_rnd_email 7) gen_rnd_pan 8) gen_rnd_ssn 9) gen_rnd_us_phone 10) mask_inner 11) mask_outer 12) mask_pan 13) mask_pan_relaxed 14) mask_ssn
Plugin Installation :
root@localhost:(none)>install plugin data_masking SONAME 'data_masking.so'; Query OK, 0 rows affected (0.10 sec)
- In MySQL 8.0 there were different types of masking functions are available.
- I have categorised those functions based on their usage.
Architecture Diagram :
General-Purpose Functions:
- mask_inner
- mask_outer
mask_inner :
- It will mask the interior part of a string, We can specify the number of strings that needs to be masked.
Example :
root@localhost:mydbops_test>select mask_inner(aadhar_number,1,2) as aadhar_number,mask_inner(status,2,0) as status from aadhar_details; +-----------------+--------+ | aadhar_number | status | +-----------------+--------+ | 1XXXXXXXXXXXX78 | ACXXXX | | 9XXXXXXXXXXX78 | ACXXXX | +-----------------+--------+ 2 rows in set (0.01 sec)
- In the above example, I have masked the (aadhar_number & status) column using mask_inner function. I have passed the 2 arguments.
- The first argument indicates margin1. It will mask the inner part of the data.
- The second argument indicates margin2. It will mask part of the data.
Note :
- For using this mask_inner function you need minimum “3“ characters.
mask_outer :
- It will mask the left and right ends of a string. We can specify the number of strings that needs to be masked.
Example :
root@localhost:mydbops_test>select mask_outer(aadhar_number,3,2) as aadhar_number,mask_outer(status,2,2) as status from aadhar_details; +-----------------+--------+ | aadhar_number | status | +-----------------+--------+ | XXX2334455667XX | XXTIXX | | XXX883445566XX | XXTIXX | +-----------------+--------+ 2 rows in set (0.00 sec)
- In the above example, I have masked the (aadhar_number & status) column using mask_outer function.
I have passed the 2 arguments.
- The first argument indicates margin1. It will mask the data from the beginning.
- The second argument indicates margin2. It will mask the data from the end of the row.
Special-Purpose :
- mask_pan
- mask_pan_relaxed
- mask_ssn
mask_pan :
- The mask span function it will mask all the values from passed argument except the last four-digit.
Example :
root@localhost:mydbops_test>select mask_pan(aadhar_number),mask_pan(user_identity) from aadhar_details; +-------------------------+-------------------------+ | mask_pan(aadhar_number) | mask_pan(user_identity) | +-------------------------+-------------------------+ | XXXXXXXXXXX6778 | XXXXXXwxrin | | XXXXXXXXXX66201 | XXXXXXXrstg | +-------------------------+-------------------------+ 2 rows in set (0.00 sec)
Note :
- If the argument is shorter than required, it is returned unchanged value.
- For using this (mask_pan) function, We need a minimum of 5 characters.
Example :
root@localhost:mydbops_test>select mask_pan('data'); +------------------+ | mask_pan('data') | +------------------+ | data | +------------------+ 1 row in set (0.00 sec)
mask_pan_relaxed :
- The mask_pan_relaxed function will mask other characters except for first six digits and last 4 digits.
Example :
root@localhost:mydbops_test>select mask_pan_relaxed(photo) from aadhar_details; +-------------------------------+ | mask_pan_relaxed(photo) | +-------------------------------+ | https:XXXXXXX.jpg | | https:XXXXXXXXXXXXXXXXXXX.jpg | +-------------------------------+ 2 rows in set (0.00 sec)
Note :
- The string must be a suitable length,(ie.. string must be 11 characters long).Otherwise, this mask_pan_relaxed function did not check the string.
mask_ssn :
- The mask_ssn is a Social Security number that returns the number except for last 4 digits. Other numbers are replaced by “X” characters.
Example :
root@localhost:mydbops_test>select mask_ssn('90963692211'); +-------------------------+ | mask_ssn('90963692211') | +-------------------------+ | XXX-XX-2211 | +-------------------------+ 1 row in set (0.00 sec)
Generating Random Data with Specific Characteristics :
- One of the nice functions of MySQL Data Masking is the ability to generate business-relevant datasets.
- Because it is not always possible to test/simulate your application on your real dataset (indeed playing with the customer credit card or security social numbers is a very bad practice) this feature is very convenient.
Several functions are available,
gen_range() : returns a random integer selected from a given range.
gen_rnd_email() : returns a random email address in the example.com domain.
gen_rnd_pan() : returns a random payment card Primary Account Number.
gen_rnd_us_phone() : returns a random U.S. phone number.
Masking Data for Customer Identification :
- This type of masking function is generating random data from the dictionary file.
- Here the main advantage is we can load the dictionary files manually.
- When a dictionary is loaded, it becomes part of the dictionary registry and is assigned a name to be used by other dictionary functions.
- Dictionaries are loaded from plain text files containing one term per line. Empty lines are ignored. To be valid, a dictionary file must contain at least one non-empty line.
gen_dictionary :
- It returns some random data from a dictionary.
Example :
mysql> select gen_dictionary('name') from aadhar_details; +------------------------+ | gen_dictionary('name') | +------------------------+ | La Trinité | | Fonds-Saint-Denis | +------------------------+ 2 rows in set (0.00 sec)
gen_dictionary_load :
- Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
- In order to use data from a dictionary, we must first load the data.
Sample data :
[root@mydbopslabs26 mysql-files]# cat random_dictionary.txt Basse-Pointe Bellefontaine Case-Pilote Ducos Fonds-Saint-Denis Fort-de-France Grand'Rivière Gros-Morne L'Ajoupa-Bouillon La Trinité
- We can load the dictionary files using (gen_dictionary_load) function.
Example :
mysql> SELECT gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name'); +-----------------------------------------------------------------------+ | gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name') | +-----------------------------------------------------------------------+ | Dictionary load success | +-----------------------------------------------------------------------+ 1 row in set (0.00 sec)
Note :
- To loading the dictionary file to MySQL you need to put the file under (secure_file_priv) directory.Otherwise, MySQL will not allow loading the dictionary file.
mysql> SELECT gen_dictionary_load('/home/mydbops/dictionary','user_identity'); +-----------------------------------------------------------------+ | gen_dictionary_load('/home/mydbops/dictionary','user_identity') | +-----------------------------------------------------------------+ | Dictionary load error: dictionary file not readable | +-----------------------------------------------------------------+ 1 row in set (0.00 sec)
How to use these functions in a production use case?
- We can achieve this by creating views. While creating the views we can specify what are the columns needs to mask.
Sample Table:
mysql> desc aadhar_details; +---------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | aadhar_number | bigint(20) | NO | | NULL | | | name | varchar(16) | NO | | NULL | | | mobile | int(11) | YES | | NULL | | | address | varchar(150) | NO | | NULL | | | email | varchar(64) | YES | | NULL | | | photo | blob | YES | | NULL | | | user_identity | blob | YES | | NULL | | | status | varchar(16) | YES | | NULL | | | create_time | timestamp | NO | | NULL | | +---------------+--------------+------+-----+---------+----------------+ 10 rows in set (0.03 sec)
For the above table, I am masking a few columns, ie…
- aadhar_number – The gen_range function will generate some random numbers.
- mobile – The gen_rnd_us_phone function will generate random numbers in U.S standard format.
- name & address – The gen_dictionary function will load random dictionary data.
- email – The gen_rnd_email function will generate random email id’s.
- photo – The mask_inner function will mask the inner part of the selected column.
Creating View :
mysql> create view mydbops_masking as select gen_range(900000000, 999999999) as aadhar_number, gen_rnd_us_phone() as mobile, gen_dictionary('name') as name, gen_dictionary('address') as address, gen_rnd_email() as email, mask_inner(photo,1,5) as photo from aadhar_details; Query OK, 0 rows affected (0.03 sec)
Now the data looks like below,
mysql> select aadhar_number,mobile,name,email,photo from mydbops_masking; +---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+ | aadhar_number | mobile | name | email | photo | +---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+ | 976896841 | 1-555379-4677 | ˙∆√∆ç©√˙∫∫∆\∫∆\©¨∆∫˙ | lQGeD5ZA@example.com | hXXXXXXXXXXXl.jpg | | 990742763 | 1-555330-5061 | ∆˙©ç†˙∆˚˙©©¨˙©ƒ∂†®ƒ©˙∆˜µ∆∫˙¨©ƒç | dSODymDW@example.com | hXXXXXXXXXXXXXXXXXXXXXXXe.jpg | +---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+ 2 rows in set (0.00 sec)
Conclusion :
- This is a really awesome feature in MySQL 8.0. But this feature is only available for Enterprise MySQL.
- In Percona MySQL 8.0 they have implemented this same feature in open source. And they are expanding the Data-Masking plugin to implementing new features.