Portworx StorageCluster not working on ARO cluster

I am attempting to deploy Portworx Essentials on an ARO Cluster, and it is failing due to permission issues. Im trying to build automation that can provision ARO clusters for future enterprise workloads.

I followed the docs here to create the ARO cluster:

I’ve tried multiple permutations of deploying a StorageCluster instance from various docs/guides that I have found in both current and legacy docs, and in the configurator.

When I attempt to deploy the StorageCluster, it never leaves initializing state. If I look at the pods, the pod logs show messages similar to:

portworx[331376]: time="2022-03-09T03:29:20Z" level=error msg="Authentication error: Authentication error: compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'e21b0f9f-39a2-4331-8dc3-998a11c8e4e6' with object id 'e21b0f9f-39a2-4331-8dc3-998a11c8e4e6' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/bc1627c6-ec80-4da3-8d18-03e91330e2f1/resourceGroups/aro-y5sdwcze/providers/Microsoft.Compute/virtualMachines/toolkit-dev-aro-z4tmm-worker-eastus3-wvnv4' or the scope is invalid. 

However, I’ve created the service principal as described in ARO setup. I’m assuming that I am missing some other permission for the ARO cluster, but I am not sure exactly what is missing. Can anyone help point me in the right direction?

I saw another thread mentioning separate steps for running Portworx on ARO clusters, but I haven’t been able to actually find details.

A little more observation… I set the permissions on a resource group that I created. It looks like a second, “ghost” resource group is created by Azure/ARO when the cluster is created, and it appears that this resource group has different permissions than those I defined on the RG that I created specifically for this cluster. Any guidance would be greatly appreciated.

Another update…
I realized in the portworx docs here that the service principal is created for the derived resource group, not the one I manually created.

I think that is the root problem. However, It looks like Azure doesn’t allow you to create a service principal that has access to the derived RG:

$ az ad sp create-for-rbac --role=portworx-toolkit-dev-aro --scopes="/subscriptions/<redacted>/resourcegroups/<redacted>"

Creating 'portworx-toolkit-dev-aro' role assignment under scope '/subscriptions/<redacted>/resourcegroups/<redacted>'
  Role assignment creation failed.

The client 'amtrice@us.ibm.com' with object id '<redacted>' has permission to perform action 'Microsoft.Authorization/roleAssignments/write' on scope '/subscriptions/<redacted>/resourcegroups/<redacted>/providers/Microsoft.Authorization/roleAssignments/<redacted>'; however, the access is denied because of the deny assignment with name '<redacted>' and Id '<redacted>' at scope '/subscriptions/<redacted>/resourcegroups/<redacted>'.

So I guess the real question is… how do you create a service principal that has appropriate permissions to the resource group that ARO creates when you create the ARO cluster?

When deploying Portworx on Azure Redhat Openshift (ARO) the virtual machines are created in a resource group with a Deny Assignment role that prevents any service principal to have access to virtual machines, except the service principal created for the resource group.
The workaround is to identify the service principal for the resource group that has access and configure the “px-azure” secret with credentials from that account.

Follow these steps from the Azure web console:

Step 1. In Azure UI navigate to “Virtual Machine” and click on the resource group
Step 2. In the resource group page select “Access control (IAM)” on the left panel, then “Deny assignments” at the top bar and click on the “Name” link
Step 3. In the resource group page select “Access control (IAM)” on the left panel, then “Deny assignments” at the top bar and click on the “Name” link
Step 4. In the “Deny Assignment” page you can see that all principals are denied access, except for “aro-xxxxxx”. Click on the principal name
Step 5. In the “aro-xxxxxx” page copy the “Application ID”. Also save the application id and the objects id, they will be used to authenticate. These will be passed on via the Portworx cluster spec as an “env” variable. Example:

  env:
  - name: AZURE_CLIENT_SECRET
    valueFrom:
      secretKeyRef:
        name: px-azure
        key: AZURE_CLIENT_SECRET
  - name: AZURE_CLIENT_ID
    valueFrom:
      secretKeyRef:
        name: px-azure
        key: AZURE_CLIENT_ID
  - name: AZURE_TENANT_ID
    valueFrom:
      secretKeyRef:
        name: px-azure
        key: AZURE_TENANT_ID

Step 6. Follow step 3 "Create a secret called px-azure"in the following: Prepare Your AKS Platform
Step 7. Go back to your home page and open the Azure Active Directory page, select “App registrations” on the left pane and then “All applications”. Paste the application id copied in the previous steps and click enter. Click on the service principal link to open the next page.
Step 8. Click on “Certificates & secrets” on the left pane and then click on “New client secret” to create a new secret. Copy and save the secret.
Step 9. Go to https://central.portworx.com/ to generate the spec.
Note: For this environment you will need to select both “Openshift 4+” and “Azure Kubernets Services (AKS)” to generate the correct yaml. Since you can only select one or the other (be sure to select “Azure Kubernetes Service (AKS)”, make sure the other is manually added to the spec. You need both “osft=true” and “aks=true”. example from my spec:

…osft=true&stork=true&st=k8s&aks=true"

Hope this helps!