kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

bobbypage · 2022-11-02T21:43:22Z

What happened?

Kubelet has an internal syncTerminatedPod which is called after pods are terminated. The function is responsible for some final pod cleanup and is responsible to ensure that volumes mounted to the pod are unmounted. The function calls volumeManager.WaitForUnmount:

kubernetes/pkg/kubelet/kubelet.go

Lines 1881 to 1885 in 7d9c0e0

    
           // volumes are unmounted after the pod worker reports ShouldPodRuntimeBeRemoved (which is satisfied 
        
           // before syncTerminatedPod is invoked) 
        
           if err := kl.volumeManager.WaitForUnmount(pod); err != nil { 
        
           	return err 
        
           }

As part of doing some testing for a different issue, I came across an issue with emptydir handling of unmounting -- it looks like that volumeManager.WaitForUnmount will return true if even if the empty dir was not unmounted successfully.

Chatted with @msau42 about this issue and it seems this is because, during WaitForUnmount, it is checking for mounted state:

kubernetes/pkg/kubelet/volumemanager/cache/actual_state_of_world.go

Line 965 in 7d9c0e0

    
           if mountedPodName == podName && podObj.volumeMountStateForPod == operationexecutor.VolumeMounted {

However, if there is an error during unmounting the volume is marked as "uncertain" (

kubernetes/pkg/volume/util/operationexecutor/operation_generator.go

Line 879 in 7d9c0e0

markMountUncertainErr := actualStateOfWorld.MarkVolumeMountAsUncertain(opts)

), which results in WaitForUnmount succeeding despite an error during unmounting. It's unclear if this is expected behavior.

What did you expect to happen?

I expected that volumeManager.WaitForUnmount will block (or return error) if the emptydir had an error unmounting.

How can we reproduce it (as minimally and precisely as possible)?

Create a kind cluster (I used v1.25.2 node image)

kind_config="$(cat << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: ipv4
nodes:
# the control plane node
- role: control-plane
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        v: "4"
EOF
)"

kind create cluster --config <(printf '%s\n' "${kind_config}") --image kindest/node:v1.25.2@sha256:9be91e9e9cdf116809841fc77ebdb8845443c4c72fe5218f3ae9eb57fdb4bace

Create the following pod:

# test-pd.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pd
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "999999"
    name: test-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
  - name: cache-volume
    emptyDir: {}

$ kubectl apply -f test-emptydir.yaml

Get the pod uid

$ kubectl get pod test-pd -o json | jq .metadata.uid
"5a8ef092-2173-4175-9557-947b73f9a8f9"

Enter the kind-worker (node) and do a chattr +i on the emtpydir. This will make the emptydir volume immutable and prevent it from being unmounted (and deleted).

$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# cd /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir#
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir# chattr +i cache-volume/
root@kind-worker:/var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir# lsattr
----i---------e------- ./cache-volume

Now delete, the pod:

kubectl delete pod  test-pd --grace-period=5

Here's the logs:

Kubelet logs - https://gist.github.com/4f9b1fa0edda8d260c90a1f18c9dc6e5
Kubelet logs for test-pd pod: https://gist.github.com/d63b8713b71bef1e712ca138fcb5d602

The notable logs:

# pod started
Nov 01 01:53:33 kind-worker kubelet[275]: I1101 01:53:33.808887     275 kubelet.go:2096] "SyncLoop ADD" source="api" pods=[default/test-pd]
Nov 01 01:53:33 kind-worker kubelet[275]: I1101 01:53:33.809011     275 pod_workers.go:585] "Pod is being synced for the first time" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

Termination (syncTerminatingPod):

Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605701     275 kubelet.go:1765] "syncTerminatingPod enter" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605720     275 kubelet_pods.go:1447] "Generating pod status" pod="default/test-pd"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605789     275 kubelet_pods.go:1457] "Got phase for pod" pod="default/test-pd" oldPhase=Running phase=Running
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605918     275 kubelet.go:1795] "Pod terminating with grace period" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605970     275 kuberuntime_container.go:698] "Killing container with a grace period override" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d" gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.605997     275 kuberuntime_container.go:702] "Killing container with a grace period" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d" gracePeriod=5
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.606148     275 event.go:294] "Event occurred" object="default/test-pd" fieldPath="spec.containers{test-container}" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container test-container"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617051     275 status_manager.go:688] "Patch status for pod" pod="default/test-pd" patch="{\"metadata\":{\"uid\":\"5a8ef092-2173-4175-9557-947b73f9a8f9\"}}"
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617099     275 status_manager.go:695] "Status for pod is up-to-date" pod="default/test-pd" statusVersion=3
Nov 01 01:56:34 kind-worker kubelet[275]: I1101 01:56:34.617120     275 kubelet_pods.go:938] "Pod is terminated, but some containers are still running" pod="default/test-pd"
Nov 01 01:56:39 kind-worker kubelet[275]: I1101 01:56:39.815024     275 kuberuntime_container.go:711] "Container exited normally" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9 containerName="test-container" containerID="containerd://d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d"

...


volume not unmounted -> (`operation not permitted`) due to `chattr +i`

Nov 01 01:56:40 kind-worker kubelet[275]: E1101 01:56:40.075488     275 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume podName:5a8ef092-2173-4175-9557-947b73f9a8f9 nodeName:}" failed. No retries permitted until 2022-11-01 01:56:40.575457522 +0000 UTC m=+246.937181939 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume") pod "5a8ef092-2173-4175-9557-947b73f9a8f9" (UID: "5a8ef092-2173-4175-9557-947b73f9a8f9") : unlinkat /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir/cache-volume: operation not permitted

But syncTerminatedPod succeeded (despite the volume not actually unmounting) (This is because WaitForUnmount succeeded incorrectly)

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324119     275 kubelet.go:1863] "syncTerminatedPod enter" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324148     275 kubelet_pods.go:1447] "Generating pod status" pod="default/test-pd"
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324176     275 kubelet.go:2134] "SyncLoop (PLEG): event for pod" pod="default/test-pd" event=&{ID:5a8ef092-2173-4175-9557-947b73f9a8f9 Type:ContainerDied Data:d3e6225e41896d5f4fc90a42840ef3d24bb35482676a2be7d4a61af6c6b4ea9d}
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324199     275 kubelet_pods.go:1457] "Got phase for pod" pod="default/test-pd" oldPhase=Running phase=Running
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324243     275 kubelet.go:2134] "SyncLoop (PLEG): event for pod" pod="default/test-pd" event=&{ID:5a8ef092-2173-4175-9557-947b73f9a8f9 Type:ContainerDied Data:a34e590797c764b7bccf00fe20688406ad15474b2d02287703625e18125d9c43}
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324282     275 volume_manager.go:448] "Waiting for volumes to unmount for pod" pod="default/test-pd"

--> This shows `WaitForUnmount` succeeded, but the volume didn't actually unmount successfully

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324432     275 volume_manager.go:475] "All volumes are unmounted for pod" pod="default/test-pd"
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.324461     275 kubelet.go:1876] "Pod termination unmounted volumes" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

...

Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333021     275 kubelet.go:1897] "Pod termination removed cgroups" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333135     275 kubelet.go:1904] "Pod is terminated and will need no more status updates" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9
Nov 01 01:56:40 kind-worker kubelet[275]: I1101 01:56:40.333158     275 kubelet.go:1906] "syncTerminatedPod exit" pod="default/test-pd" podUID=5a8ef092-2173-4175-9557-947b73f9a8f9

Volume continues to try to be unmounted later

Nov 01 01:57:11 kind-worker kubelet[275]: E1101 01:57:11.691319     275 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume podName:5a8ef092-2173-4175-9557-947b73f9a8f9 nodeName:}" failed. No retries permitted until 2022-11-01 01:57:43.691274244 +0000 UTC m=+310.052998673 (durationBeforeRetry 32s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/5a8ef092-2173-4175-9557-947b73f9a8f9-cache-volume") pod "5a8ef092-2173-4175-9557-947b73f9a8f9" (UID: "5a8ef092-2173-4175-9557-947b73f9a8f9") : unlinkat /var/lib/kubelet/pods/5a8ef092-2173-4175-9557-947b73f9a8f9/volumes/kubernetes.io~empty-dir/cache-volume: operation not permitted
...
Nov 01 01:57:43 kind-worker kubelet[275]: I1101 01:57:43.789411     275 kubelet_pods.go:949] "Pod is terminated, but some volumes have not been cleaned up" pod="default/test-pd"

Anything else we need to know?

No response

Kubernetes version

1.25.2

Cloud provider

n/a

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

bobbypage · 2022-11-02T21:49:16Z

/sig storage
/sig node

/cc @msau42

bobbypage · 2022-11-02T21:49:21Z

/kind bug

SergeyKanzhelev · 2022-11-09T18:33:14Z

/triage accepted
/priority important-longterm

doesn't look like a regression

/cc @xmcqueen

k8s-triage-robot · 2023-02-07T18:44:55Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

msau42 · 2023-02-08T04:45:36Z

/remove-lifecycle stale

k8s-triage-robot · 2024-02-08T05:12:43Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot · 2024-05-08T05:56:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-07T06:36:32Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

msau42 · 2024-08-03T00:07:42Z

/remove lifecycle-rotten

BenTheElder · 2024-08-05T17:55:36Z

Another one for kubernetes/test-infra#32957

This issue looks like it may be tricky to solve but remains a source of flakes and should stay tracked.

bobbypage added the kind/bug Categorizes issue or PR as related to a bug. label Nov 2, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 2, 2022

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 2, 2022

bobbypage changed the title ~~kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted sucessfuly~~ kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully Nov 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

msau42 mentioned this issue Mar 14, 2023

Give terminal phase correctly to all pods that will not be restarted #115331

Merged

bobbypage added this to [sig-node] Node & Pod Lifecycle Mar 24, 2023

bobbypage moved this to Todo in [sig-node] Node & Pod Lifecycle Mar 24, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 7, 2024

kannon92 added this to SIG Node Bugs Jul 22, 2024

kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024

BenTheElder added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 5, 2024

k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 5, 2024

BenTheElder added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 5, 2024

haircommander moved this from Triaged to Triage in SIG Node Bugs Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

bobbypage commented Nov 2, 2022 •

edited

Loading

bobbypage commented Nov 2, 2022

bobbypage commented Nov 2, 2022

SergeyKanzhelev commented Nov 9, 2022

k8s-triage-robot commented Feb 7, 2023

msau42 commented Feb 8, 2023

k8s-triage-robot commented Feb 8, 2024

k8s-triage-robot commented May 8, 2024

k8s-triage-robot commented Jun 7, 2024

msau42 commented Aug 3, 2024

BenTheElder commented Aug 5, 2024

kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

kubelet: volumeManager.WaitForUnmount does not wait for emptydir to be unmounted successfully #113563

Comments

bobbypage commented Nov 2, 2022 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

bobbypage commented Nov 2, 2022

bobbypage commented Nov 2, 2022

SergeyKanzhelev commented Nov 9, 2022

k8s-triage-robot commented Feb 7, 2023

msau42 commented Feb 8, 2023

k8s-triage-robot commented Feb 8, 2024

k8s-triage-robot commented May 8, 2024

k8s-triage-robot commented Jun 7, 2024

msau42 commented Aug 3, 2024

BenTheElder commented Aug 5, 2024

bobbypage commented Nov 2, 2022 •

edited

Loading