Unable to provision Azure Disks when creating StorageCluster

Background

I am attempting to create a StorageCluster for Portworx Essentials (PE) 2.7 in
Microsoft Azure Red Hat OpenShift (ARO). The operator installed correctly using
the following Kube Manifest.

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: portworx-essentials
  namespace: kube-system
spec:
  channel: "stable"
  name: portworx-essentials
  source: community-operators
  sourceNamespace: openshift-marketplace
...

Then applied with

oc apply -f ./src/kube_manifests/oscp_portworx-essentials.yaml

Then I went to the Portworx generator site, and generated the following Kube
Manifest.

---
kind: StorageCluster
apiVersion: core.libopenstorage.org/v1
metadata:
  name: px-cluster-<redacted>
  namespace: kube-system
  annotations:
    portworx.io/install-source: "https://install.portworx.com/?mc=false&\
                                 kbver=1.19.0%2Ba5a0987&\
                                 oem=esse\
                                 &user=<redacted>&\
                                 b=true&\
                                 mz=5&s=%22type%3DPremium_LRS%2Csize%3D1000%22&\
                                 j=auto\
                                 &kd=type%3DPremium_LRS%2Csize%3D150&\
                                 c=px-cluster-<redacted>&\
                                 osft=true&operator=true&stork=true&\
                                 csi=true&\
                                 lh=true&\
                                 st=k8s&e=OSCP_OWNER%3DPhillip%20Dudley"
    portworx.io/is-openshift: "true"
    portworx.io/misc-args: --oem esse
spec:
  image: portworx/oci-monitor:2.7.0
  imagePullPolicy: Always
  kvdb:
    internal: true
  cloudStorage:
    deviceSpecs:
      - type=Premium_LRS,size=1000
    journalDeviceSpec: auto
    kvdbDeviceSpec: type=Premium_LRS,size=150
    maxStorageNodesPerZone: 5
  secretsProvider: k8s
  stork:
    enabled: true
    args:
      webhook-controller: "false"
  userInterface:
    enabled: true
  autopilot:
    enabled: true
  featureGates:
    CSI: "true"
  env:
    - name: OSCP_OWNER
      value: Phillip Dudley
...
---
apiVersion: v1
kind: Secret
metadata:
  name: px-essential
  namespace: kube-system
data:
  px-essen-user-id: '<redacted>'
  px-osb-endpoint: '<redacted>'
...

I then applied this with

oc apply -f ./src/kube_manifests/oscp_px_storage-cluster.yaml

The Problem

After applying the StorageCluster, I recieve the following errors.

2021-06-29T18:48:17.271393198Z @dudleyparopoc-5ck7j-worker-centralus1-wzfpz portworx[2552656[]: time="2021-06-29T18:48:17Z" level=error msg="Authentication error: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/<redacted>/resourceGroups/aro-mby91d36/providers/Microsoft.Compute/virtualMachines/dudleyparopoc-5ck7j-worker-centralus1-wzfpz?%24expand=instanceView&api-version=2018-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}" func=InitAndBoot package=boot
2021-06-29T18:48:17.271419098Z @dudleyparopoc-5ck7j-worker-centralus1-wzfpz portworx[2552656[]: time="2021-06-29T18:48:17Z" level=error msg="Could not init boot manager" error="Authentication error: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/<redacted>/resourceGroups/aro-mby91d36/providers/Microsoft.Compute/virtualMachines/dudleyparopoc-5ck7j-worker-centralus1-wzfpz?%24expand=instanceView&api-version=2018-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}"

The mentioned Service Principal does work and has Contributor and User Access Administrator to the Subscription level. I used the same Service
Principal with the openshift-install IPI method, so I know that it works.

I set Portworx Essentials 2.7 up in AKS to prove that the Portworx tool does indeed work. I was able to get it working in AKS but ran into other issues. So that means Portworx Essentials 2.7 does work on AKS and was able to authenticate to Azure. However, the portworx-pvc-controller pods were in a CrashLoopBackOff state with the following errors:

│ Flag --address has been deprecated, see --bind-address instead.                                                                                                                            │
│ I0706 17:27:36.062022       1 serving.go:331] Generated self-signed cert in-memory                                                                                                         │
│ W0706 17:27:36.062087       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.                                     │
│ failed to create listener: failed to listen on 0.0.0.0:10257: listen tcp 0.0.0.0:10257: bind: address already in use

To me, the obvious is that something is already listening on that port in the Pod. Is there a supporting container in the Pod that could also be listening on that Port causing issues?

@Phillip.Dudley : Can you add this 2 following lines in your StorageCluster spec under annotations

portworx.io/pvc-controller-port: "9030"
portworx.io/pvc-controller-secure-port: "9031"

Next Operator release will take care of this for now you can use the above annotations and you should see your PVC Controller pods running.

@Phillip.Dudley If you are using Azure ARO then you need to an additional steps I will share those steps by tomorrow.

We are wanting to use ARO. We only tried to use AKS to prove a point that Portworx did work. We got further than ARO did, but as you see, ran into other issues which I imagine the annotations would help mitigate.

@Phillip.Dudley Yes please add the annotation and verify that will mitigate the issue you are facing.

For ARO I will share the doc.

Once we get the cluster rebuilt, we’ll try that out. Azure is having some issues currently building our ARO cluster. We’re working with Microsoft on this. I will post back once I get something figured out.

According to IBM, it seems that Portworx is not supported with ARO for IBM Cloud Pak for Data. So this may be on my back burner for a bit.

What are the separate steps required to deploy Portworx on an ARO cluster?

Thanks!