Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

Open
bobbypage opened this issue Nov 2, 2022 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@bobbypage
Copy link
Member

bobbypage commented Nov 2, 2022

What happened?

Kubelet has an internal syncTerminatedPod which is called after pods are terminated. The function is responsible for some final pod cleanup and is responsible to ensure that volumes mounted to the pod are unmounted. The function calls volumeManager.WaitForUnmount:

// volumes are unmounted after the pod worker reports ShouldPodRuntimeBeRemoved (which is satisfied
// before syncTerminatedPod is invoked)
if err := kl.volumeManager.WaitForUnmount(pod); err != nil {
return err
}

As part of doing some testing for a different issue, I came across an issue with emptydir handling of unmounting -- it looks like that volumeManager.WaitForUnmount will return true if even if the empty dir was not unmounted successfully.

Chatted with @msau42 about this issue and it seems this is because, during WaitForUnmount, it is checking for mounted state:

if mountedPodName == podName && podObj.volumeMountStateForPod == operationexecutor.VolumeMounted {

However, if there is an error during unmounting the volume is marked as "uncertain" (

markMountUncertainErr := actualStateOfWorld.MarkVolumeMountAsUncertain(opts)
), which results in WaitForUnmount succeeding despite an error during unmounting. It's unclear if this is expected behavior.

What did you expect to happen?

I expected that volumeManager.WaitForUnmount will block (or return error) if the emptydir had an error unmounting.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create a kind cluster (I used v1.25.2 node image)
kind_config="$(cat << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: ipv4
nodes:
# the control plane node
- role: control-plane
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        v: "4"
EOF
)"

kind create cluster --config <(printf '%s\n' "${kind_config}") --image kindest/node:v1.25.2@sha256:9be91e9e9cdf116809841fc77ebdb8845443c4c72fe5218f3ae9eb57fdb4bace

Create the following pod:

# test-pd.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pd
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "999999"
    name: test-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
  - name: cache-volume
    emptyDir: {}
$ kubectl apply -f test-emptydir.yaml

Get the pod uid

$ kubectl get pod test-pd -o json | jq .metadata.uid
"5a8ef092-2173-4175-9557-947b73f9a8f9"

Enter the kind-worker (node) and do a chattr +i on the emtpydir. This will make the emptydir volume immutable and prevent it from being unmounted (and deleted).

$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# cd /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir#
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir# chattr +i cache-volume/
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir# lsattr
----i---------e------- ./cache-volume

Now delete, the pod:

kubectl delete pod  test-pd --grace-period=5

Here's the logs:

Kubelet logs - https://gist.github.com/4f9b1fa0edda8d260c90a1f18c9dc6e5
Kubelet logs for test-pd pod: https://gist.github.com/d63b8713b71bef1e712ca138fcb5d602

The notable logs:

# pod started
Nov 01 01:53:33 kind-worker kubelet[275]: I1101 01:53:33.808887     275 kubelet.go:2096] "SyncLoop ADD" source="api" pods=[default/test-pd]
Nov 01 01:53:33 kind-worker kubelet[275]: I1101 01:53:33.809011     275 pod_workers.go:585] "Pod is being synced for the first time" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

Termination (syncTerminatingPod):

Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605701     275 kubelet.go:1765] "syncTerminatingPod enter" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605720     275 kubelet_pods.go:1447] "Generating pod status" pod="default/test-pd"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605789     275 kubelet_pods.go:1457] "Got phase for pod" pod="default/test-pd" oldPhase=Running phase=Running
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605918     275 kubelet.go:1795] "Pod terminating with grace period" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605970     275 kuberuntime_container.go:698] "Killing container with a grace period override" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d" gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605997     275 kuberuntime_container.go:702] "Killing container with a grace period" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d" gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.606148     275 event.go:294] "Event occurred" object="default/test-pd" fieldPath="spec.containers{test-container}" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container test-container"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617051     275 status_manager.go:688] "Patch status for pod" pod="default/test-pd" patch="{\"metadata\":{\"uid\":\"5a8ef092-2173-4175-9557-947b73f9a8f9\"}}"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617099     275 status_manager.go:695] "Status for pod is up-to-date" pod="default/test-pd" statusVersion=3
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617120     275 kubelet_pods.go:938] "Pod is terminated, but some containers are still running" pod="default/test-pd"
Nov 01 01:56:39 kind-worker kubelet[275]: I1101 01:56:39.815024     275 kuberuntime_container.go:711] "Container exited normally" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d"

...


volume not unmounted -> (`operation not permitted`) due to `chattr +i`

Nov 01 01:56:40 kind-worker kubelet[275]: E1101 01:56:40.075488     275 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume podName:5a8ef092-2173-4175-9557-947b73f9a8f9 nodeName:}" failed. No retries permitted until 2022-11-01 01:56:40.575457522 +0000 UTC m=+246.937181939 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume") pod "5a8ef092-2173-4175-9557-947b73f9a8f9" (UID: "5a8ef092-2173-4175-9557-947b73f9a8f9") : unlinkat /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir/cache-volume: operation not permitted

But syncTerminatedPod succeeded (despite the volume not actually unmounting) (This is because WaitForUnmount succeeded incorrectly)

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324119     275 kubelet.go:1863] "syncTerminatedPod enter" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324148     275 kubelet_pods.go:1447] "Generating pod status" pod="default/test-pd"
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324176     275 kubelet.go:2134] "SyncLoop (PLEG): event for pod" pod="default/test-pd" event=&{ID:5a8ef092-2173-4175-9557-947b73f9a8f9 Type:ContainerDied Data:d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d}
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324199     275 kubelet_pods.go:1457] "Got phase for pod" pod="default/test-pd" oldPhase=Running phase=Running
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324243     275 kubelet.go:2134] "SyncLoop (PLEG): event for pod" pod="default/test-pd" event=&{ID:5a8ef092-2173-4175-9557-947b73f9a8f9 Type:ContainerDied Data:a34e590797c764b7bccf00fe20688406ad15474b2d02287703625e18125d9c43}
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324282     275 volume_manager.go:448] "Waiting for volumes to unmount for pod" pod="default/test-pd"

--> This shows `WaitForUnmount` succeeded, but the volume didn't actually unmount successfully

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324432     275 volume_manager.go:475] "All volumes are unmounted for pod" pod="default/test-pd"
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324461     275 kubelet.go:1876] "Pod termination unmounted volumes" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

...

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333021     275 kubelet.go:1897] "Pod termination removed cgroups" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333135     275 kubelet.go:1904] "Pod is terminated and will need no more status updates" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333158     275 kubelet.go:1906] "syncTerminatedPod exit" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

Volume continues to try to be unmounted later

Nov 01 01:57:11 kind-worker kubelet[275]: E1101 01:57:11.691319     275 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume podName:5a8ef092-2173-4175-9557-947b73f9a8f9 nodeName:}" failed. No retries permitted until 2022-11-01 01:57:43.691274244 +0000 UTC m=+310.052998673 (durationBeforeRetry 32s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume") pod "5a8ef092-2173-4175-9557-947b73f9a8f9" (UID: "5a8ef092-2173-4175-9557-947b73f9a8f9") : unlinkat /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir/cache-volume: operation not permitted
...
Nov 01 01:57:43 kind-worker kubelet[275]: I1101 01:57:43.789411     275 kubelet_pods.go:949] "Pod is terminated, but some volumes have not been cleaned up" pod="default/test-pd"

Anything else we need to know?

No response

Kubernetes version

1.25.2

Cloud provider

n/a

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@bobbypage bobbypage added the kind/bug Categorizes issue or PR as related to a bug. label Nov 2, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 2, 2022
@bobbypage
Copy link
Member Author

/sig storage
/sig node

/cc @msau42

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 2, 2022
@bobbypage
Copy link
Member Author

/kind bug

@bobbypage bobbypage changed the title kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted sucessfuly kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully Nov 2, 2022
@SergeyKanzhelev
Copy link
Member

/triage accepted
/priority important-longterm

doesn't look like a regression

/cc @xmcqueen

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 9, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2023
@msau42
Copy link
Member

msau42 commented Feb 8, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 7, 2024
@kannon92 kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024
@msau42
Copy link
Member

msau42 commented Aug 3, 2024

/remove lifecycle-rotten

@BenTheElder BenTheElder added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 5, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 5, 2024
@BenTheElder BenTheElder added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 5, 2024
@BenTheElder
Copy link
Member

Another one for kubernetes/test-infra#32957

This issue looks like it may be tricky to solve but remains a source of flakes and should stay tracked.

@haircommander haircommander moved this from Triaged to Triage in SIG Node Bugs Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Triage
Development

No branches or pull requests

6 participants