-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some kube-state-metrics shards are serving up stale metrics #2372
Comments
qq: have your Statefulset labels been changed? |
For this particular case, we don't suspect they'd changed (tho we drop the metric to confirm this 100%). But for other cases that we run into this issue, almost always the labels get changed, particularly the chart version when we upgrade:
|
/assign @CatherineF-dev |
This is related to #2347
This is a new issue. |
For the purposes of this issue, I think it's wholly related to #2347 (the one time we claimed the statefulset may not have changed labels, we had no proof of that). IMO, we can track this issue to that PR for closure (and if we do see another case of stale metrics, we can try to gather those exact circumstances in a, if needed, separate issue) |
Looks like this will be resolved in v2.13.0 |
For tracking puposes, this problem still persists (even in the latest version). This may lend credence to the one time we ran into this issue and claimed there was no label change: So I believe #2431 is reporting the same issue. The labels/versions for reference:
|
@schahal could you reproduce this issue consistently? If so, could you help provide detailed steps to reproduce it? You can anonymize pod name. |
Aside from what's in the description, I feel like this consistently happens anytime the
Invariably, right after that get shards with stale metrics - mitigated only by restarting the pods. #2431 and this slack thread have other perspectives from different users on the same symptom, which may shed some other light. |
What happened:
We found some kube-state-metrics shards are serving up stale metrics.
For example, this pod is running and healthy:
However, we see for the past hour that
kube_pod_container_status_waiting_reason
is reporting it inCreatingContainer
:And to prove this is being served by KSM, we looked at the incriminating shard's (
kube-state-metrics-5
) /metrics endpoint and saw this metric is definitely stale:This is one such example, there seem to be several such situations.
What you expected to happen:
Expectation is that the metric(s) match reality
How to reproduce it (as minimally and precisely as possible):
Unfortunately, we're not quite sure when/why it gets into this state (anecdotally, it almost always happens when we upgrade KSM, though today there was no update besides some Prometheus agents)
We can mitigate the issue by restarting all the KSM shards... e.g.,
... if that's any clue to determine root cause.
Anything else we need to know?:
When I originally ran into the problem, I thought it had something to do with the Compatibility Matrix. But starting with KSM v2.11.0, I confirmed the client libraries are updated for my version of k8s (v1.28)
There's nothing out of the ordinary in the KSM logs:
Click to view kube-state-metrics-5 logs
Environment:
kubectl version
): v1.28.6The text was updated successfully, but these errors were encountered: