I’m running px-dev cluster of 3 nodes on Proxmox VMs. When I run dd if=/dev/zero of=test bs=2048 count=1048576 on locally mounted pwx volume I’m getting throughput ~6MB/s, while when I’m running the same command on local disk (disk configs are all the same) I’m getting the speed ~250MB/s.
My cluster configuration is:
root@ydev1:/mnt/infra/starts/pwx# pxctl status
Status: PX is operational
License: PX-Developer
Node ID: e93a6044-d5e8-417e-b858-919b2028328b
IP: 10.3.0.110
Local Storage Pool: 1 pool
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 200 GiB 10 GiB Online default default
Local Storage Devices: 1 device
Device Path Media Type Size Last-Scan
0:1 /dev/sdb STORAGE_MEDIUM_SSD 200 GiB 14 Oct 20 11:50 UTC
total - 200 GiB
Cache Devices:
* No cache devices
Journal Device:
1 /dev/sdd1 STORAGE_MEDIUM_SSD
Metadata Device:
1 /dev/sdc STORAGE_MEDIUM_SSD
* Internal kvdb on this node is using this dedicated metadata device to store its data.
Cluster Summary
Cluster ID: pwx.testrepl.ygdev
Cluster UUID: ae0c7c03-d569-4ca3-8145-7789bac04935
Scheduler: swarm
Nodes: 3 node(s) with storage (3 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
10.88.0.2 e93a6044-d5e8-417e-b858-919b2028328b 7635l43biffytgaff38t6j763 Yes 10 GiB 200 GiB Online Up (This node) 2.6.1.2-669fb0c 5.4.0-48-generic Ubuntu 20.04.1 LTS
10.88.0.1 78bf22f2-bb50-40e2-8455-8b964b4096e0 t3swt67sbrzw95wvczr6zbbvv Yes 10 GiB 200 GiB Online Up 2.6.1.2-669fb0c 5.4.0-48-generic Ubuntu 20.04.1 LTS
10.88.0.3 785f9144-93d9-477c-b3ef-813adc5365ed r1ezu9t1tni9ojkiwl2hhgfb3 Yes 10 GiB 200 GiB Online Up 2.6.1.2-669fb0c 5.4.0-48-generic Ubuntu 20.04.1 LTS
Warnings:
WARNING: Swap is enabled on this node.
Global Storage Pool
Total Used : 30 GiB
Total Capacity : 600 GiB
The network between the nodes is configured on local bridge, not limited by physical NIC speed.
The volume was created with the command: pxctl volume create testvl --shared --async_io --early_ack -s 30 -r 3 --io_priority high
We are recommending people use sharedv4 instead of shared (as the latter is considered legacy and being phased out), as sharedv4 has superior performance. Additionally, you can set --io_profile db_remote and see if that results in even better peformance (these require replica level of 3).
Thanks a lot for the advice!!! Using switch --sharedv4 incredibly increases the performance back to native, but I’m loosing the ability to attach the volume on multiple host, despite mentioned here: https://docs.portworx.com/concepts/shared-volumes/
Can you please advise on this as well?
My current volume creation command is:
Can you elaborate on what you mean by “losing the ability to attach the volume on multiple hosts” ? It’s unclear what your use-case looks like in more detail from the information provided so far.
You definitely should be able to utilize the volume from other nodes, but can you specify if you are always you using pxctl directly on other hosts?
Typically for kubernetes based environments, we usually have this action performed via the orchestrator control plane via a StorageClass object (where you can specify parameters such as sharedv4: true and repl: 3) and the volume attachment then happens transparently when you have a PersistetVolumCclaim that references that SC, and the generated PersistentVolume is attached/mounted into that node/pod when that workload needs it.
I’m using Docker swarm now.
Indeed I’ve just checked that the volume is shared between multiple instances of container on different nodes.
Also, I have use cases, mounting volumes on the host level for NFS sharing. I’m not sure, the local mount will work in case of failover, So I prefer to mount on all nodes and to use keepalived to failover based on mounted volume availability.
But when I’m running on node 1 I’m getting OK and it works:
Let’s elaborate a bit more on the attach vs mounting process.
Attaching a volume can only happen on one node at a time, and this is what creates the /dev/pxd/pxdNNNNN device that is a virtual block device that gives you access to the Portworx volume. This typically happens on the node where the volume has replicas, but can happen on other nodes that don’t have replicas (and this will just be accessed over the network to get the data from nodes that have replcias), such as when you have workloads on one node (possibly storageless) but the volume’s storage on another node.
Mounting the volume, on the other hand, will use the /sbin/mount command on either the /dev/pxd/pxdNNNNN device file (if this is a ReadWriteOnce (e.g. non-shared) access-mode volume) or on the first node where a shared volume attached (and this also exports the volume via NFS), and any OTHER node will then mount the NFS exported volume from the first. In a kubernetes-based environment (where Portworx Essentials is supported only as of today), the mount point on the filesystem will be within the kubelet’s pod working directory.
Now whether this approach of yours is the best way, we can’t say but if you want to give us a higher level view of what you’re trying to achieve, we can comment on whether the approach you’re going about is actually the best or if there’s another one you may not have considered.
Sorry, probably I was confused by using “–shared” key, which provides ability to create block devices on different nodes from the same volume.
My goals as following:
I’m good with sharing the same volume inside docker container, running as a swarm service instance.
For host level usage, I’m using a simple scenario to achieve high availability for sharing a directory via nfs. My plan was to mount the volume and to run nfs server on multiple nodes. The HA is provided by keepalive service, that switches Virtual IP between nodes based on the volume accessibility. That works just fine with “–shared” key used with volume creation, except very poor performance that was my initial question.
If Portworx utilizes NFS for internal sharing, can I just mount a volume directly from Portworx trough NFS, instead of mounting the volume locally and creating another NFS export?
We support accessing the exported the volume via NFS outside portworx if you use one of the following options:
either to allow it to be consumed by any IP, add this to the storageclass (or as a label when using pxctl volume create/update) allow_all_ips=true - requires version 2.3.2 or higher
alternatively to specify exactly which IP can access it, use this: allow_ips=ip1;ip2 - requires version 2.6.0 or higher
The exact share name can be queried from the node where it’s attached in its /etc/exports (which is a standard NFS file for this configuration), based on the (numeric) volumeID.
However i am not sure if your keepalived service will work well, as remember, a volume can only be attached on one node at a time (and this node is the one who does the NFS exporting in the case of volume types of sharedv4)
Thank you a lot for your suggestion. Your help is very valuable!
The last concern that you mentioned has a workaround by running a dummy container that mounts the volume usual way on each of the Protworx cluster nodes. In this case I have the volume available on the host machine in /var/lib/osd/mounts directory as well.
The last question. After installation of Portworx OCI bundle, creating the cluster and starting the service, my /etc/exports file on the host machine is being overwritten from time to time with the empty one. Is it a way to prevent this without using immutable attribute on the file?