Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

Open
PRNDA opened this issue Sep 24, 2024 · 12 comments
Open

microk8s.daemon-kubelite produces tons of error logs on all nodes #4681

PRNDA opened this issue Sep 24, 2024 · 12 comments

Comments

@PRNDA
Copy link

PRNDA commented Sep 24, 2024

Summary

We have a 4-node Microk8s HA Cluster running for 2 years, recently we found that the "microk8s.daemon-kubelite" service on all nodes produces tons of error logs like this:

Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.421789  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.668481  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.691204  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.900976  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"
Sep 24 16:27:01 svr02 microk8s.daemon-kubelite[205845]: E0924 16:27:01.949677  205845 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"

I tried to restart this service by running systemctl restart snap.microk8s.daemon-kubelite but it did not help, searched this error message around the web but did not find anything helpful.

All pods seem running fine, and I am still able to update our deployments (but the update progress is much slower than before).

Can someone help me resolve this problem?

Cluster status:

root@svr02:~# microk8s.status
microk8s is running
high-availability: yes
  datastore master nodes: 172.16.40.232:19001 172.16.40.231:19001 172.16.40.233:19001
  datastore standby nodes: 172.16.218.180:19001
addons:
  enabled:
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    ingress              # Ingress controller for external access
    metrics-server       # K8s Metrics Server for API access to service metrics
    prometheus           # Prometheus operator for monitoring and logging
    rbac                 # Role-Based Access Control for authorisation
    storage              # Storage class; allocates storage from host directory

microk8s inspect:

root@svr02:~# microk8s inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow
Inspecting dqlite
  Inspect dqlite

Building the report tarball
  Report tarball is at /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz
@PRNDA PRNDA changed the title microk8s.daemon-kubelite produces tons of error logs on all notes microk8s.daemon-kubelite produces tons of error logs on all nodes Sep 24, 2024
@louiseschmidtgen
Copy link
Contributor

Hello @PRNDA,

thank you for reporting your issue with us.

Could you please upload the inspection report that you have created under /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz please? With this information we can better assist you to resolve the issue.

Thank you!

@PRNDA
Copy link
Author

PRNDA commented Sep 25, 2024

Hello @PRNDA,

thank you for reporting your issue with us.

Could you please upload the inspection report that you have created under /var/snap/microk8s/4916/inspection-report-20240924_162747.tar.gz please? With this information we can better assist you to resolve the issue.

Thank you!

I created this inspection report yesterday, but I found some sensitive information in the logs, so I decided not to upload it here, Is there a way that I can send it to you privately?

@louiseschmidtgen
Copy link
Contributor

Hi @PRNDA,

how would you prefer to share it? Would you be able to upload the inspection report somewhere we could pull it from?

@PRNDA
Copy link
Author

PRNDA commented Sep 25, 2024

Hi @louiseschmidtgen ,

I created a private repo here, and uploaded the inspection file into this repo, could you please accept my repo invitation first and then download this inspection file?

Sorry for the inconvenience.

@louiseschmidtgen
Copy link
Contributor

Hello @PRNDA ,

I have received your invitation and have access to the logs.

Thank you for sharing the inspection report, I will be having a look shortly.

@louiseschmidtgen
Copy link
Contributor

Linking this issue as possibly related: #4293

@louiseschmidtgen
Copy link
Contributor

louiseschmidtgen commented Sep 27, 2024

Hello @PRNDA,

are you able to reproduce this issue on a more recent MicroK8s snap? You are currently running on v1.23 which is out of support.

With kind regards,
Louise

@PRNDA
Copy link
Author

PRNDA commented Sep 27, 2024

Hello @PRNDA,

are you able to reproduce this issue on a more recent MicroK8s snap? You are currently running on v1.23 which is out of support.

With kind regards, Louise

I'm afraid I can not, this is a production system, and I'm not allowed to upgrade it.

@ClaudZen
Copy link

Have you tried deleting Calico-Node pods?

@PRNDA
Copy link
Author

PRNDA commented Sep 30, 2024

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

@ClaudZen
Copy link

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

Deleting the Calico-Node pods should not interrupt the execution of other pods, as Kubernetes will automatically re-schedule new Calico-Node pods to maintain network connectivity. However, there might be a temporary disruption in pod networking while the new Calico pods start.

@PRNDA
Copy link
Author

PRNDA commented Oct 3, 2024

Have you tried deleting Calico-Node pods?

Will this interrupt the running pods?

Deleting the Calico-Node pods should not interrupt the execution of other pods, as Kubernetes will automatically re-schedule new Calico-Node pods to maintain network connectivity. However, there might be a temporary disruption in pod networking while the new Calico pods start.

There might be a temporary disruption in pod networking

That's what I'm worried about, as this cluster is running several online systems, I don't want them to be affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants