Hope everyone aware about known about LVM(Logical Volume Manager) an extremely useful tool for handling the storage at various levels. LVM basically functions by layering abstractions on top of physical storage devices as mentioned below in the illustration.
Below is a simple diagrammatic expression of LVM
sda1 sdb1 (PV:s on partitions or whole disks) \ / \ / Vgmysql (VG) / | \ / | \ data log tmp (LV:s) | | | xfs ext4 xfs (filesystems)
IOPS is an extremely important resource, when it comes to storage it defines the performance of disk. Let’s not forget PIOPS(Provisioned IOPS) one of the major selling points for AWS and other cloud vendors for production machines such as databases. Since Disk is the slowest in the server, we can compare the major components as below.
Consider CPU in speed range of Fighter Jet, RAM in speed range of F1 car and hard Disk in speed range of bullock cart. With modern hardware improvement, IOPS is also seeing significant improvement with SSD’s.
In this blog, we are going to see Merging and Stripping of multiple HDD drives to reap the benefit of disks and combined IOPS
Below is the Disk attached to my server, Each is an 11TB disk with Max supported IOPS of 600.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 10G 0 disk sda1 8:1 0 10G 0 part sdb 8:16 0 10.9T 0 disk sdc 8:32 0 10.9T 0 disk sdd 8:48 0 10.9T 0 disk sde 8:64 0 10.9T 0 disk sdf 8:80 0 10.9T 0 disk sdg 8:96 0 10.9T 0 disk
sda is the root partition, sd[b-g] is the attached HDD disk,
With Mere merging of these disk, you will have space management since the disk is clubbed in a linear fashion. With stripping our aim is to get 600*6=3600 IOPS or atleast a value somewhere around 3.2 k to 3.4 k.
Now let’s proceed to create the PV (Physical volume)
# pvcreate /dev/sd[b-g] Physical volume "/dev/sdb" successfully created. Physical volume "/dev/sdc" successfully created. Physical volume "/dev/sdd" successfully created. Physical volume "/dev/sde" successfully created. Physical volume "/dev/sdf" successfully created. Physical volume "/dev/sdg" successfully created.
Validating the PV status:
# pvs PV VG Fmt Attr PSize PFree /dev/vdb lvm2 --- 10.91t 10.91t /dev/vdc lvm2 --- 10.91t 10.91t /dev/vdd lvm2 --- 10.91t 10.91t /dev/vde lvm2 --- 10.91t 10.91t /dev/vdf lvm2 --- 10.91t 10.91t /dev/vdg lvm2 --- 10.91t 10.91t
Let’s proceed to create a volume group (VG) with a physical extent of 1MB, (PE is similar to block size with physical disks) and volume group name as “vgmysql” combining the PV’s
#vgcreate -s 1M vgmysql /dev/vd[b-g] -v Wiping internal VG cache Wiping cache of LVM-capable devices Wiping signatures on new PV /dev/vdb. Wiping signatures on new PV /dev/vdc. Wiping signatures on new PV /dev/vdd. Wiping signatures on new PV /dev/vde. Wiping signatures on new PV /dev/vdf. Wiping signatures on new PV /dev/vdg. Adding physical volume '/dev/vdb' to volume group 'vgmysql' Adding physical volume '/dev/vdc' to volume group 'vgmysql' Adding physical volume '/dev/vdd' to volume group 'vgmysql' Adding physical volume '/dev/vde' to volume group 'vgmysql' Adding physical volume '/dev/vdf' to volume group 'vgmysql' Adding physical volume '/dev/vdg' to volume group 'vgmysql' Archiving volume group "vgmysql" metadata (seqno 0). Creating volume group backup "/etc/lvm/backup/vgmysql" (seqno 1). Volume group "vgmysql" successfully created
Will check the volume group status as below with VG display
# vgdisplay -v --- Volume group --- VG Name vgmysql System ID Format lvm2 Metadata Areas 6 MetadataSequenceNo 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 6 Act PV 6 VG Size 65.48 TiB PE Size 1.00 MiB Total PE 68665326 Alloc PE / Size 0 / 0 Free PE / Size 68665326 / 65.48 TiB VG UUID 51KvHN-ZqgY-LyjH-znpq-Ufy2-AUVH-OqRNrN
Now our volume group is ready, let’s proceed to create Logical Volume(LV) space with stripe size of 16K equivalent to the page size of MySQL (InnoDB) to be stripped across the 6 attached disk
# lvcreate -L 7T -I 16k -i 6 -n mysqldata vgmysql Rounding size 7.00 TiB (234881024 extents) up to stripe boundary size 7.00 TiB (234881028 extents). Logical volume "mysqldata" created.
-L volume size
-I strip size
-i Equivalent to number of disks
-n LV name
Vgmysql volume group to use
lvdisplay to provide a complete view of the Logical volume
# lvdisplay -m --- Logical volume --- LV Path /dev/vgmysql/mysqldata LV Name mysqldata VG Name vgmysql LV UUID Y6i7ql-ecfN-7lXz-GzzQ-eNsV-oax3-WVUKn6 LV Write Access read/write LV Creation host, time warehouse-db-archival-none, 2019-08-26 15:50:20 +0530 LV Status available # open 0 LV Size 7.00 TiB Current LE 7340034 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 384 Block device 254:0 --- Segments --- Logical extents 0 to 7340033: Type striped Stripes 6 Stripe size 16.00 KiB
Now we will proceed to format with XFS and mount the partition
# mkfs.xfs /dev/mapper/vgmysql-mysqldata
Below are the mount options used
/dev/mapper/vgmysql-mysqldata on /var/lib/mysql type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,sunit=32,swidth=192,noquota)
Now let’s proceed with the FIO test to have IO benchmark.
Command:
#fio --randrepeat=1 --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=10 --size=512M --runtime=60 --time_based --iodepth=64 --group_reporting
Result:
read : io=1467.8MB, bw=24679KB/s, iops=1542, runt= 60903msec slat (usec): min=3, max=1362.7K, avg=148.74, stdev=8772.92 clat (msec): min=2, max=6610, avg=233.47, stdev=356.86 lat (msec): min=2, max=6610, avg=233.62, stdev=357.65 write: io=1465.1MB, bw=24634KB/s, iops=1539, runt= 60903msec slat (usec): min=4, max=1308.1K, avg=162.97, stdev=8196.09 clat (usec): min=551, max=5518.4K, avg=180989.83, stdev=316690.67 lat (usec): min=573, max=5526.4K, avg=181152.80, stdev=317708.30
We have the desired iops ~3.1k by merging and stripped LVM rather than the normal IOPS of 600
Key Take-aways:
- Management of storage becomes very easy with LVM
- Distributed IOPS with stripping helps in enhancing disk performance
- LVM snapshots
Downsides:
Every tool has its own downsides, we should embrace it. Considering the use case it serves best ie., IOPS in our case. One major downside I could think of is, if any one of the disks fails with this setup there will be a potential data-loss/Data corruption.
Work Around:
- To avoid this data-loss/Data corruption we have set-up HA by adding 3 slaves for this setup in production
- Have a regular backup for stripped LVM with xtrabackup, MEB, or via snapshot
- RAID 0 also serves the same purpose as the stripped LVM.