Poor mysql performance

We are in process of moving our mysql database to docker swarm. We have performance issue:

We have 3 nodes of 1TB SSD drive each portworx cluster. They are connected over 10GBit network.
I created volume for mysql database, like this:
pxctl volume create cmtsdb_vl --sharedv4 -s 1024 -r 2 --io_priority high --io_profile=db_remote

The volume is mounted by docker swarm onto /var/lib/mysql
The volume is created on two machines of the 3 available.
The container runs on one of the two machines that host the volume.

Sustained writes work at acceptable 200GB speed:
root@db-cmts_nabp:/var/lib/mysql# time dd if=/dev/zero bs=1G count=10 of=delete_me oflag=sync
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 51.4847 s, 209 MB/s

real 0m51.582s
user 0m0.005s
sys 0m27.818s

But mysql has very poor performance - miserable 2MBytes/s.

I suspect that’s because SYNC takes unusual long time to complete, when mysql is running, sync takes 600-1000 milliseconds:
root@db-cmts_nabp:/# time sync

real 0m0.660s
user 0m0.003s
sys 0m0.000s

I already tried to minimize number of SYNC issued by mysql:

innodb_flush_log_at_trx_commit=2

I also remounted the ext4 filesystem of the volume to have nobarrier option, but that did not help (I know it is unacceptable for production).

Do you have other suggestions for me to try?
Portworx Version: 2.6.x
Deployment Type: On Premise

Hi Arie,

Why are you creating --sharedv4 volume ? for database you don’t need SharedV4.

You can create a normal volume like this pxctl volume create cmtsdb_vl -s 1024 -r 2 --io_priority high --io_profile=db_remote

Also your provisioning 1TB of volume which is equivalent to backend disk, can you verify if it is exactly 1TB or less then 1TB ? I would recommend to create 900 or 950 GB and leave some buffer space.

Can you share the output of pxctl status.

I rebuilt the volume without the --sharedv4 and with 950GB, but performance is still the same 1.5-2MB/s of write I/O (and sync takes 0.2-0.5 seconds):
root@cmts03-nabp:~# pxctl v l
ID NAME SIZE HA SHARED ENCRYPTED PROXY-VOLUME IO_PRIORITY STATUS SNAP-ENABLED
418766255164877162 cmtsdb_vl 950 GiB 2 no no no HIGH up - attached on 10.3.8.193no
877398645523791336 cmtsdb_vl_20210108 950 GiB 2 no no no HIGH up - detached no

pxctl status output:

root@cmts03-nabp:~# pxctl status
Status: PX is operational
License: PX-Developer
Node ID: 5def87aa-3c02-432a-b4c3-4bfdaa94dc8f
        IP: 10.3.8.193 
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       HIGH            raid0           1.0 TiB 285 GiB Online  default default
        Local Storage Devices: 1 device
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sdd        STORAGE_MEDIUM_SSD      1.0 TiB         06 Jan 21 10:13 UTC
        total                   -                       1.0 TiB
        Cache Devices:
         * No cache devices
        Kvdb Device:
        Device Path     Size
        /dev/sdb        100 GiB
         * Internal kvdb on this node is using this dedicated kvdb device to store its data.
        Metadata Device: 
        1       /dev/sdc        STORAGE_MEDIUM_SSD
Cluster Summary
        Cluster ID: pwx.cmts-db.nabp
        Cluster UUID: c407d355-b559-4059-81b5-adc6ef796799
        Scheduler: swarm
        Nodes: 3 node(s) with storage (3 online)
        IP              ID                                      SchedulerNodeName               StorageNode     Used    Capacity        Status  StorageStatus       Version         Kernel                  OS
        10.3.8.192      83280ada-8cc3-4c41-8c47-9568c6ca3c38    axblwfvg0f8fwb775sfupb5co       Yes             285 GiB 1.0 TiB         Online  Up 2.6.2.0-f0dd370  5.4.0-60-generic        Ubuntu 20.04.1 LTS
        10.3.8.193      5def87aa-3c02-432a-b4c3-4bfdaa94dc8f    ypm1g1hqv4r8y2jcl1r80ti0h       Yes             285 GiB 1.0 TiB         Online  Up (This node)      2.6.2.0-f0dd370 5.4.0-58-generic        Ubuntu 20.04.1 LTS
        10.3.8.191      00774d33-e530-4416-8f3e-eab63e5f1df8    nsf8a7e4dvdeez2vcpfsvmmgm       Yes             10 GiB  1.0 TiB         Online  Up 2.6.2.0-f0dd370  5.4.0-58-generic        Ubuntu 20.04.1 LTS
        Warnings: 
                 WARNING: Swap is enabled on this node.
Global Storage Pool
        Total Used      :  580 GiB
        Total Capacity  :  3.0 TiB
root@cmts03-nabp:~#

Hi Aries,

Can you describe more in details about your platform used, are they on any VM’s/Cloud/BareMetal ?

And you mentioned your moving your database to docker swarm, so where was this previously running ?

How are you comparing the metrics/numbers do you have any comparison chart ?

Yes, the three machines are three proxmox 6.2-15 VMs.
Each machine has allocated 64GB RAM and 6 CPUs.
8 physical SSD drives are set into 10TB LVM that are then used to provide virtual disks to VMs.
The virtual disks have Cache option “Default (No cache)”.

Per Performance Tweaks - Proxmox VE it seems it is safe to enable Writeback caching, as in worst scenario of one machine failure (that we want to protect against), the other two machines would continue to work normally. What do you think?

I found the reason for the slowness. Posting here, may be it will help someone…

Turned out the default setting for binlog is sync_binlog=1 - meaning issue fsync after each binlog save. And in my case, the binlogs were written to a separate portworx volume with --sharedv4 flag on.

After changing sync_binlog to 0, we get solid 20MB/s of writes.
Thank you for your help!

Hi Arie,

Thank you for the update, yes it will sure help others if they ran in same issue.