K9S and Kubeadm
After playing K8S for couple of weeks and started to deploy K8S and debugging network issues
1 K9S
This is the LIFESAVOR of K8S management. and everything is so easy from now on.
wget https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_linux_amd64.deb
sudo apt install ./k9s_linux_amd64.deb
2 Add a node to K9s
There is always a k8s cluster running on node A and I would like to add node B to the cluster.
# Cleanning up
sudo kubeadm reset
sudo systemctl stop kubelet docker # Stop related services
sudo rm -rf /etc/kubernetes /var/lib/etcd /var/lib/cni
# restart
sudo systemctl restart docker kubelet
# Running on node A
# This command would generate the join command for node B
kubeadm token create --print-join-command
# Run on node B
kubeadm join <node_A_IP>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>
3 Copy Kube Configs
Applying following scripts for every user who want to run kubectl
.
mkdir -p /home/USERNAME/.kube
# use admin.conf as the config file
cp /etc/kubernetes/admin.conf /home/USERNAME/.kube/config
# change permission
chown USERNAME:USERNAME /home/USERNAME/.kube/config
4 Multiple context setup
You can manage multiple context with combined config files
- Combine 2 config files
export KUBECONFIG=~/.kube/config.1:~/.kube/config.2 kc config view --flatten > ~/.kube/config chmod 600 ~/.kube/config
- Setup context
# List contexts from combine configs kc config get-contexts # Set which context to use and default namespace kc config use-context context_1 --namespace=default_namespace kc config use-context context_2 # List current context kc config current-context # Set default namespace kc config set-context --current --namespace=default_namespace
5 Network Debugging
I found that Ray cluster worker failed to start and the issue is that I can’t ping the IP of a Pod, even on the same node as the Pod. (The node was the newly added from last step) The solutions worked was resinstall Calico
# Delete and reapply Calico manifest
kubectl delete -f https://docs.projectcalico.org/manifests/calico.yaml
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# Restart Calico pods
kubectl -n kube-system rollout restart daemonset/calico-node
There are something else tried but may be helpful next time.
sysctl net.ipv4.ip_forward # Should return 1
echo 1 > /proc/sys/net/ipv4/ip_forward # If not 1, try this
# Check ip route
ip route get <POD_IP>
# check CNI
journalctl -u calico-node -n 100
# check network policies
kubectl get networkpolicies -A
# check which port is used
sudo netstat -tulnp | grep :10250
# t-TCP u-UDP l-Full n-numerical p-protocal