microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

PRNDA · 2024-09-24T09:34:48Z

Summary

We have a 4-node Microk8s HA Cluster running for 2 years, recently we found that the "microk8s.daemon-kubelite" service on all nodes produces tons of error logs like this:

Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.421789  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.668481  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.691204  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.900976  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.949677  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"

I tried to restart this service by running systemctl restart snap.microk8s.daemon-kubelite but it did not help, searched this error message around the web but did not find anything helpful.

All pods seem running fine, and I am still able to update our deployments (but the update progress is much slower than before).

Can someone help me resolve this problem?

Cluster status:

root@svr02:~# microk8s.status
microk8s is running
high-availability: yes
  datastore master nodes: 172.16.40.232:19001 172.16.40.231:19001 172.16.40.233:19001
  datastore standby nodes: 172.16.218.180:19001
addons:
  enabled:
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    ingress              # Ingress controller for external access
    metrics-server       # K8s Metrics Server for API access to service metrics
    prometheus           # Prometheus operator for monitoring and logging
    rbac                 # Role-Based Access Control for authorisation
    storage              # Storage class; allocates storage from host directory

microk8s inspect:

root@svr02:~# microk8s inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow
Inspecting dqlite
  Inspect dqlite

Building the report tarball
  Report tarball is at /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz

The text was updated successfully, but these errors were encountered:

louiseschmidtgen · 2024-09-24T11:18:33Z

Hello @PRNDA,

thank you for reporting your issue with us.

Could you please upload the inspection report that you have created under /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz please? With this information we can better assist you to resolve the issue.

Thank you!

PRNDA · 2024-09-25T02:34:34Z

Hello @PRNDA,

thank you for reporting your issue with us.

Could you please upload the inspection report that you have created under /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz please? With this information we can better assist you to resolve the issue.

Thank you!

I created this inspection report yesterday, but I found some sensitive information in the logs, so I decided not to upload it here, Is there a way that I can send it to you privately?

louiseschmidtgen · 2024-09-25T12:16:36Z

Hi @PRNDA,

how would you prefer to share it? Would you be able to upload the inspection report somewhere we could pull it from?

PRNDA · 2024-09-25T16:25:30Z

Hi @louiseschmidtgen ,

I created a private repo here, and uploaded the inspection file into this repo, could you please accept my repo invitation first and then download this inspection file?

Sorry for the inconvenience.

louiseschmidtgen · 2024-09-26T09:55:16Z

Hello @PRNDA ,

I have received your invitation and have access to the logs.

Thank you for sharing the inspection report, I will be having a look shortly.

louiseschmidtgen · 2024-09-26T10:06:56Z

Linking this issue as possibly related: #4293

louiseschmidtgen · 2024-09-27T14:46:13Z

Hello @PRNDA,

are you able to reproduce this issue on a more recent MicroK8s snap? You are currently running on v1.23 which is out of support.

With kind regards,
Louise

PRNDA · 2024-09-27T14:59:43Z

Hello @PRNDA,

are you able to reproduce this issue on a more recent MicroK8s snap? You are currently running on v1.23 which is out of support.

With kind regards, Louise

I'm afraid I can not, this is a production system, and I'm not allowed to upgrade it.

ClaudZen · 2024-09-27T19:42:16Z

Have you tried deleting Calico-Node pods?

PRNDA · 2024-09-30T06:58:50Z

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

ClaudZen · 2024-09-30T13:58:21Z

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

Deleting the Calico-Node pods should not interrupt the execution of other pods, as Kubernetes will automatically re-schedule new Calico-Node pods to maintain network connectivity. However, there might be a temporary disruption in pod networking while the new Calico pods start.

PRNDA · 2024-10-03T15:36:19Z

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

Deleting the Calico-Node pods should not interrupt the execution of other pods, as Kubernetes will automatically re-schedule new Calico-Node pods to maintain network connectivity. However, there might be a temporary disruption in pod networking while the new Calico pods start.

There might be a temporary disruption in pod networking

That's what I'm worried about, as this cluster is running several online systems, I don't want them to be affected.

PRNDA changed the title ~~microk8s.daemon-kubelite produces tons of error logs on all notes~~ microk8s.daemon-kubelite produces tons of error logs on all nodes Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

PRNDA commented Sep 24, 2024 •

edited

Loading

louiseschmidtgen commented Sep 24, 2024

PRNDA commented Sep 25, 2024

louiseschmidtgen commented Sep 25, 2024

PRNDA commented Sep 25, 2024 •

edited

Loading

louiseschmidtgen commented Sep 26, 2024

louiseschmidtgen commented Sep 26, 2024

louiseschmidtgen commented Sep 27, 2024 •

edited

Loading

PRNDA commented Sep 27, 2024

ClaudZen commented Sep 27, 2024

PRNDA commented Sep 30, 2024

ClaudZen commented Sep 30, 2024

PRNDA commented Oct 3, 2024

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

Comments

PRNDA commented Sep 24, 2024 • edited Loading

Summary

louiseschmidtgen commented Sep 24, 2024

PRNDA commented Sep 25, 2024

louiseschmidtgen commented Sep 25, 2024

PRNDA commented Sep 25, 2024 • edited Loading

louiseschmidtgen commented Sep 26, 2024

louiseschmidtgen commented Sep 26, 2024

louiseschmidtgen commented Sep 27, 2024 • edited Loading

PRNDA commented Sep 27, 2024

ClaudZen commented Sep 27, 2024

PRNDA commented Sep 30, 2024

ClaudZen commented Sep 30, 2024

PRNDA commented Oct 3, 2024

PRNDA commented Sep 24, 2024 •

edited

Loading

PRNDA commented Sep 25, 2024 •

edited

Loading

louiseschmidtgen commented Sep 27, 2024 •

edited

Loading