Replies: 4 comments 4 replies
-
Hi @larhauga, we are facing the same issue in our clusters. A simple reconcile does not work. |
Beta Was this translation helpful? Give feedback.
-
Hi @larhauga , Apologies for hijacking this thread, I am trying to install the scaleset controller helm chart in similar way using flux and Kustomization and I could not find related info on the user document. I would really appreciate if you could share some insights on your arc setup. |
Beta Was this translation helpful? Give feedback.
-
I have been struggling getting around this, and I think I've found a solution in your helmRelease for the runner-scale-set, make sure to add:
Per the Flux HelmRelease documentation, the driftDetection can look for differences between what Helm thinks is deployed, and what is actually running in the cluster:
I've been able to successfully go back and forth through version upgrades, and this seems to stop them from disappearing. I was noticing that after the controller upgrade occurred, the listeners would be killed, but Now, this still is less than ideal, as I'm not sure what other negatives could occur by using that setting. |
Beta Was this translation helpful? Give feedback.
-
Hey @larhauga, This is a great question and your assumption why we did it is exactly right. I'll try to explain the reasoning behind it.
This is not to say we won't revisit this decision. It is perfectly reasonable to expect that you can upgrade the autoscaling runner set without touching the controller. But with the current rate of changing CRDs, this constraint greatly helps us while debugging and tracking down issues, while keeping controllers relatively simple. Another approach may be allowing version mismatches only on patch versions, since by definition, they should only include bug fixes. Anyway, I hope this clarifies this constraint. |
Beta Was this translation helpful? Give feedback.
-
Hi,
we use flux for managing the different objects related to the scale-set controller and scale-sets, and additionally use
helm template
to generated the objects from the helm template.This allows us full control over the resources being created while still following the upstream changes.
However, after the upgrade from 0.6.1 to 0.7.0 we were surprised by the way the upgrade procedure went.
We did not expect the ScaleSet controller to delete the resources, and the AutoscalingRunnerset getting deleted. This results in a long wait until a new reconcile.
There may be technical reasons for the way that this is implemented, but it is unexpected and I think can cause problems (and downtime) in the long run. Having to ensure that all the different ScaleSets have the correct version set as a label is problematic.
If you use helm from the command line or indirectly you have control over the timing of reconciles, but I think this can lead to different race conditions in the long run.
Is it a technical reason for the ScaleSet controller to delete the AutoscalingRunnerSets? I can understand that it would delete the EphemeralRunners, but not that it would need to have the correct versions as it should only be one controller version running at a time.
If it is necessary to manage the version of the scalesets, I think it should be the controllers concern, and not the helm template. If
breaking changes is needed on the CRDs, the version should be increased or the contract be expanded between versions.
I hope that we can work towards the controller not deleting the AutoscalingRunnerSets during an upgrade, ensuring that kubernetes can handle the upgrade more seamlessly.
Is there any knowledge about the reason for this being the case, or more about the intent of the version labels? (I could not find any related comments to the code).
What would be the outcome of removing the code which deletes the AutoscalingRunnerSets, and if that is not a viable solution, adding a version skew could serve as an interim solution?
Hopefully this question can discuss how the process can be better in the future.
Beta Was this translation helpful? Give feedback.
All reactions