This is a multi-agent system for troubleshooting Kubernetes applications. Its initial implementation focuses on diagnosing the Open (or Advanced) Cluster Management environment, but it can be customized for other products as well.
We set up an Open Cluster Management environment to demonstrate how it works.
Install clusteradm CLI tool Run the following command to download and install the latest clusteradm command-line tool:
curl -L https://raw.githubusercontent.com/open-cluster-management-io/clusteradm/main/install.sh | bash
Setup hub and managed cluster Run the following command to quickly setup a hub cluster and 2 managed clusters by kind.
curl -L https://raw.githubusercontent.com/open-cluster-management-io/OCM/main/solutions/setup-dev-environment/local-up.sh | bash
- User: The user who ask questions and give tasks
- Executor: Execute the code written by the 'Engineer' and report the results back to them
- Engineer: Analyze the intent of the user or planner to write a sequence of shell commands or scripts
-
Check the status of
managedclusters
-
Retrieve the resource usage of
kafka
cluster
-
Planner - Kubernetes planner, responsible for making a detailed plan to accomplish a specific task within a Kubernetes environment
-
Advisor - The knowledge repository where you can find solutions and ideas for addressing any multi-cluster issues
- Manager - orchestrates the workflow between agents
-
Scenario 1: Cluster1 - Make the bootstrap hub kubeconfig invalid
# kubectl edit secret bootstrap-hub-kubeconfig -n open-cluster-management-agent --context kind-cluster1 # kubectl edit secret hub-kubeconfig-secret -n open-cluster-management-agent --context kind-cluster1 kubectl delete secret bootstrap-hub-kubeconfig -n open-cluster-management-agent --context kind-cluster1 kubectl delete secret hub-kubeconfig-secret -n open-cluster-management-agent --context kind-cluster1
kubectl get mcl cluster1 --context kind-hub python main.py "why the status of cluster1 is unknown?"
-
Scenario 2: Disable the
Klusterlet
agent and theregistration
agent- Scale these 2 agents to 0
kubectl scale deployment klusterlet -n open-cluster-management --replicas=0 --context kind-cluster2 kubectl scale deployment klusterlet-registration-agent -n open-cluster-management-agent --replicas=0 --context kind-cluster2
- Troubleshooting the unknown issue
kubectl get mcl cluster2 --context kind-hub python main.py "why the status of cluster2 is unknown"