[K8S] CNI Weave net dial tcp connect: connection refused, failed to clean up sandbox container
k8s 클러스터 구성 후.
kubectl get pod --all-namespaces를 보면 coredns가 정상적으로 Running 되지 않고 아래와 같이 ContainerCreating 상태로 STUCK 걸리는 현상이 발생했다..
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-78fcd69978-47ngc 0/1 ContainerCreating 0 16m kube-system coredns-78fcd69978-n7dst 0/1 ContainerCreating 0 16m |
심지어 weave net 관련한 pod들도 계속 error 상태로 확인되었다.
coredns 관련 파드의 로그를 상세히 보기 위해 아래와 같이 describe 명령어를 입력해서 보면,
$ kubectl describe pod weave-net-xdrh4 -n kube-system
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 17m default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. Normal Scheduled 16m default-scheduler Successfully assigned kube-system/coredns-78fcd69978-47ngc to master.example.com Warning FailedCreatePodSandBox 16m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "6ebe5786980b899f55eaca5d050e6667759b69dd074bd31efce73606e5c415a3" network for pod "coredns-78fcd69978-47ngc": networkPlugin cni failed to set up pod "coredns-78fcd69978-47ngc_kube-system" network: unable to allocate IP address: Post "": dial tcp connect: connection refused, failed to clean up sandbox container "6ebe5786980b899f55eaca5d050e6667759b69dd074bd31efce73606e5c415a3" network for pod "coredns-78fcd69978-47ngc": networkPlugin cni failed to teardown pod "coredns-78fcd69978-47ngc_kube-system" network: Delete "": dial tcp connect: connection refused] Normal SandboxChanged 11m (x26 over 16m) kubelet Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 6m19s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "de3959a0e92a6e09b8946258be5330ed9ac308c7d0988913fa247539f5506c99" network for pod "coredns-78fcd69978-47ngc": networkPlugin cni failed to set up pod "coredns-78fcd69978-47ngc_kube-system" network: netplugin failed with no error message: signal: killed |
해당 이슈는 weave net이라는 CNI(Cluster Network Interface)가 뻑이 난 경우로 보인다.
이럴 경우 방법은 kubeadm reset + cni를 완벽하게 지우고 다시 구성하는 방법 밖에 없다.
구글링을 하던 중 아래 workaround 글을 찾아 실행하였더니 정상적으로 복구 되었다.
Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR · Issue #3758 · weaveworks/weave
What you expected to happen? To not have to supply a --pod-network-cidr= command when setting up a weave network when using kubeadm init. For the weave-net pod to remain stable when add...
1. master/node1/node2에서 kubeadm reset 실행 후 systemctl restart kubelet $ kubeadm reset $ systemctl restart kubelet 2. cni 관련 디렉토리 파일 삭제(마스터에서만) $ rm -rf /etc/cni/net.d/* $ rm -rf $HOME/.kube/config 3. 마스터에서 kubeadm init 실행 후 cni 설치 $ kubeadm init --apiserver-advertise-address= $ kubectl apply -f "https://cloud.weave.works/k8x/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" 4. 파드 확인 $ kc get pod --all-namespace |