ProviderInternal Error: InvalidVolume.NotFound

I am trying to install px on AWS EKS, and only 1 of 3 nodes is working. The other 2 nodes failed due to a rather strange error…InvalidVolume.NotFound. I think it’s strange because it’s a brand new cluster that I have just created to test portworx. If the volume doesn’t exist, should it not create it? Also, I have not deleted any volume. How do I resolve this, please? Many thanks.

~ $ kubectl -n kube-system get storagenodes -l name=portworx
NAME                                           ID                                     STATUS         VERSION           AGE
ip-192-168-19-138.eu-west-2.compute.internal   355474d2-13f5-4b66-aed7-5cfa1300492f   Online         2.6.3.0-4419aa4   19m
ip-192-168-35-42.eu-west-2.compute.internal                                           Initializing                     19m
ip-192-168-74-209.eu-west-2.compute.internal                                          Initializing                     19m
k describe storagenodes  ip-192-168-35-42.eu-west-2.compute.internal  -n kube-system | pbcopy

Name:         ip-192-168-35-42.eu-west-2.compute.internal
Namespace:    kube-system
Labels:       name=portworx
Annotations:  <none>
API Version:  core.libopenstorage.org/v1
Kind:         StorageNode
Metadata:
  Creation Timestamp:  2021-02-06T14:52:57Z
  Generation:          1
  Managed Fields:
    API Version:  core.libopenstorage.org/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:name:
        f:ownerReferences:
          .:
          k:{"uid":"96b8ac9d-13b0-4ffa-9812-9c8c83e116f5"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:cloudStorage:
      f:status:
        .:
        f:geography:
        f:network:
        f:phase:
    Manager:      operator
    Operation:    Update
    Time:         2021-02-06T14:52:57Z
    API Version:  core.libopenstorage.org/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
    Manager:    px
    Operation:  Update
    Time:       2021-02-06T15:09:59Z
  Owner References:
    API Version:           core.libopenstorage.org/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  StorageCluster
    Name:                  pwx-cluster-03732d57-45f2-48a3-80a2-97a0e48e3ad9
    UID:                   96b8ac9d-13b0-4ffa-9812-9c8c83e116f5
  Resource Version:        30416
  Self Link:               /apis/core.libopenstorage.org/v1/namespaces/kube-system/storagenodes/ip-192-168-35-42.eu-west-2.compute.internal
  UID:                     42712736-b12a-442d-a09a-656e379bbd62
Spec:
  Cloud Storage:
Status:
  Conditions:
    Last Transition Time:  2021-02-06T14:53:47Z
    Status:                Initializing
    Type:                  NodeState
    Last Transition Time:  2021-02-06T14:53:47Z
    Message:               Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
                           status code: 400, request id: ff795219-1b65-4425-b720-0502a83b8390
    Reason:                NodeStartFailure
    Status:                Failed
    Type:                  NodeInit
  Geography:
  Network:
  Phase:  Initializing
Events:
  Type     Reason                             Age   From      Message
  ----     ------                             ----  ----      -------
  Normal   PortworxMonitorImagePullInPrgress  17m   portworx  Portworx image portworx/px-essentials:2.6.3 pull and extraction in progress
  Warning  NodeStartFailure                   16m   portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: 3407e951-6c79-406a-af9f-ec2dafb62f72
  Warning  NodeStartFailure  16m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: 039aff93-688a-41bc-8cc6-3fb4343bc037
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: c1f64984-023f-4b26-a10f-d87c454f4b30
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: a50e5f7b-a838-4b44-8899-56b3fd95dab7
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: 20a4eefd-2a87-4853-8236-d18a3eec6ae8
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: af34e48f-a7a0-4ad0-b50d-59feb254413c
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: a851025f-3dbd-4b8f-be85-5ced575be08d
  Warning  NodeStartFailure  15m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: f3336f6a-09b7-4999-b7ad-f603d7bf11ae
  Warning  NodeStartFailure  14m  portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: 0a93ddf5-cd24-4f1c-8dec-d468e41c2827
  Normal   PortworxMonitorImagePullInPrgress  108s                portworx  Portworx image portworx/px-essentials:2.6.3 pull and extraction in progress
  Warning  NodeStartFailure                   78s (x84 over 14m)  portworx  (combined from similar events): Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: InvalidVolume.NotFound: The volume 'vol-07b88a955ec1cd0cc' does not exist.
           status code: 400, request id: bdd0fdef-91cc-41b7-8c16-cad75655c5e7

yes, Volume will be provisioned automatically only when you ask to do so, and also provide respective roles and policies to portworx. check this docs page Install Portworx on AWS EKS using the DaemonSet

Thanks for the response, @sensre . I am using the operator, not the daemonset - Install Portworx on AWS EKS using the Operator

And yes, I gave the IAM user all the necessary permissions and selected Create [Volume] Using Spec option in PX-Central. The fact that 1 of 3 nodes in the cluster is up and running shows that I have configured it properly :wink:

The problem is, it is trying to access a volume which does not exist on the failing nodes. The volume 'vol-07b88a955ec1cd0cc' does not exist.

Please help, thanks again.

Did you try installing multiple times on this cluster? can you please provide more details from your env?

• Describe Portworx pods:

kubectl describe pods -l name=portworx -n kube-system

• Get Portworx cluster status:

PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath=’{.items[0].metadata.name}’) and kubectl exec $PX_POD -n kube-system – /opt/pwx/bin/pxctl status

• List Portworx volumes:

PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath=’{.items[0].metadata.name}’)
kubectl exec $PX_POD -n kube-system – /opt/pwx/bin/pxctl volume list

• Portworx logs: Recent Portworx logs can be gathered by using this kubectl command:

kubectl logs -n kube-system -l name=portworx --tail=99999 (edited)

Yes, albeit partially, here’s what happened:

  1. I executed kubectl apply...
  2. then recalled I needed to save the IAM user permissions.
  3. Hit CTRL +C to cancel the kubectl apply ..
  4. Saved the permission
  5. Executed kubectl apply.. again

If the above steps resulted in the failure, it means portworx installation is not idempotent.

I guess the only the option is to Uninstall Portworx from a Kubernetes cluster using the Operator ?

Yes, let go ahead and clean the cluster completely. and share your operator spec yaml file. I can review & verify, and then you can reinstall.