Hi. I’ve currently got PX 3.0.1 installed on my K8s cluster and noticed that when prometheus is enabled, I’m seeing the operator start up and immediately startup/shutdown the prometheus pods on the nodes over and over again endlessly. If I change the deployment of the operator and set the replicas to 0, the behavior stops, and prometheus successfully starts up on one of the nodes.
If I set the replicas back 1 on the operator, it terminates the running prometheus pod and starts cycling over and over again.
Has anyone seen this before and have an idea of where to look for errors? I don’t see anything in the operator logs that point me in any particular direction…there are no errors there.
Thanks.
I just had this exact issue. No errors, no useful logs, no comments on other forums, no help from copilot - nothing. Very difficult to troubleshoot.
It turns out I had two instances of the prometheus-operator deployment - each was a different version and deployed to a separate namespace. Each instance was trying to reconcile the kube-prometheus-stack deployment with its own configuration. This resulted in one instance setting the configuration, and then the other sending SIGTERM to the existing pod because its configuration was out of sync, resulting in an endless loop with no errors or useful information to point at what was going wrong.
The issue came about when I accidentally pushed a CRD update to the wrong cluster, which installed a second instance the prometheus-operator in the default namespace and started the boot loop.
Deleting the second deployment of Prometheus-Operator immediately fixed the issue.