龙空技术网

工具分享-使用Kubeasz一键部署K8S集群

云原生知识星球 192

前言:

当前各位老铁们对“windows10安装nodejs出现2503”可能比较关心,我们都需要了解一些“windows10安装nodejs出现2503”的相关内容。那么小编在网上汇集了一些关于“windows10安装nodejs出现2503””的相关资讯,希望朋友们能喜欢,看官们一起来了解一下吧!

问题背景

为了验证最新版本的k8s是否已修复某个bug,需要快速搭建一个k8s环境,本文选取资料[1]中的kubeasz工具,并记录部署过程及相关问题。

部署过程

先下载工具脚本、kubeasz代码、二进制、默认容器镜像。

使用如下命令开始安装:

[root@node01 k8s]# ./ezdown -S2023-03-22 13:39:40 INFO Action begin: start_kubeasz_docker2023-03-22 13:39:41 INFO try to run kubeasz in a container2023-03-22 13:39:41 DEBUG get host IP: 10.10.11.492023-03-22 13:39:41 DEBUG generate ssh key pair# 10.10.11.49 SSH-2.0-OpenSSH_6.6.1f1b442b7fdaf757c7787536b17d12d76208a2dd7884d56fbd1d35817dc2e94ca2023-03-22 13:39:41 INFO Action successed: start_kubeasz_docker[root@node01 k8s]# docker psCONTAINER ID   IMAGE                                                       COMMAND                  CREATED          STATUS          PORTS     NAMESf1b442b7fdaf   easzlab/kubeasz:3.5.0                                       "sleep 36000"            15 seconds ago   Up 14 seconds             kubeasz

执行后看不出是成功,还是失败。根据文档说明,进入容器内手动执行命令:

[root@node01 ~]# docker exec -it kubeasz ezctl start-aio2023-03-22 06:15:05 INFO get local host ipadd: 10.10.11.492023-03-22 06:15:05 DEBUG generate custom cluster files in /etc/kubeasz/clusters/default2023-03-22 06:15:05 DEBUG set versions2023-03-22 06:15:05 DEBUG disable registry mirrors2023-03-22 06:15:05 DEBUG cluster default: files successfully created.2023-03-22 06:15:05 INFO next steps 1: to config '/etc/kubeasz/clusters/default/hosts'2023-03-22 06:15:05 INFO next steps 2: to config '/etc/kubeasz/clusters/default/config.yml'ansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml  playbooks/90.setup.yml2023-03-22 06:15:05 INFO cluster:default setup step:all begins in 5s, press any key to abort:PLAY [kube_master,kube_node,etcd,ex_lb,chrony] **********************************************************************************************************************************************************TASK [Gathering Facts] **********************************************************************************************************************************************************************************fatal: [10.10.11.49]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: root@10.10.11.49: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).", "unreachable": true}PLAY RECAP **********************************************************************************************************************************************************************************************10.10.11.49               : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

从日志看,提示权限有问题。实际测试可以正常的ssh免密登录:

bash-5.1# ssh-keygenGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa):/root/.ssh/id_rsa already exists.Overwrite (y/n)?bash-5.1# ssh-copy-id root@10.10.11.49/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installedexpr: warning: '^ERROR: ': using '^' as the first characterof a basic regular expression is not portable; it is ignored/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysroot@10.10.11.49's password:Number of key(s) added: 1Now try logging into the machine, with:   "ssh 'root@10.10.11.49'"and check to make sure that only the key(s) you wanted were added.bash-5.1# ssh root@10.10.11.49root@10.10.11.49's password:

查看相关配置文件,权限正常:

[root@node01 kubeasz]# ll ~/.sshtotal 16-rw------- 1 root root 1752 Mar 22 14:25 authorized_keys-rw------- 1 root root 2602 Mar 22 14:25 id_rsa-rw-r--r-- 1 root root  567 Mar 22 14:25 id_rsa.pub-rw-r--r-- 1 root root 1295 Mar 22 13:39 known_hosts

不清楚具体哪里有问题,参考资料[2],尝试改为用用户名密码执行。

在容器内配置用户密码,检查通过:

bash-5.1# vi /etc/ansible/hosts[webservers]10.10.11.49[webservers:vars]ansible_ssh_pass='******'ansible_ssh_user='root'bash-5.1# ansible webservers -m ping10.10.11.49 | SUCCESS => {    "ansible_facts": {        "discovered_interpreter_python": "/usr/bin/python"    },    "changed": false,    "ping": "pong"}

修改安装集群依赖的clusters/default/hosts文件,同样增加用户密码配置:

[etcd]10.10.11.49[etcd:vars]ansible_ssh_pass='******'ansible_ssh_user='root'# master node(s)[kube_master]10.10.11.49[kube_master:vars]ansible_ssh_pass='******'ansible_ssh_user='root'# work node(s)[kube_node]10.10.11.49[kube_node:vars]ansible_ssh_pass='******'ansible_ssh_user='root'

执行命令,提示缺少sshpass工具:

[root@node01 kubeasz]# docker exec -it kubeasz ezctl setup default allansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml  playbooks/90.setup.yml2023-03-22 07:35:46 INFO cluster:default setup step:all begins in 5s, press any key to abort:PLAY [kube_master,kube_node,etcd,ex_lb,chrony] **********************************************************************************************************************************************************TASK [Gathering Facts] **********************************************************************************************************************************************************************************fatal: [10.10.11.4]: FAILED! => {"msg": "to use the 'ssh' connection type with passwords, you must install the sshpass program"}PLAY RECAP **********************************************************************************************************************************************************************************************10.10.11.49               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

安装sshpass依赖包:

bash-5.1# apk add sshpassfetch   Installing sshpass (1.09-r0)Executing busybox-1.35.0-r17.triggerOK: 21 MiB in 47 packages

重复执行命令:

[root@node01 kubeasz]# docker exec -it kubeasz ezctl setup default allansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml  playbooks/90.setup.yml2023-03-22 07:36:37 INFO cluster:default setup step:all begins in 5s, press any key to abort:...TASK [kube-node : 轮询等待kube-proxy启动] *********************************************************************************************************************************************************************changed: [10.10.11.49]FAILED - RETRYING: 轮询等待kubelet启动 (4 retries left).FAILED - RETRYING: 轮询等待kubelet启动 (3 retries left).FAILED - RETRYING: 轮询等待kubelet启动 (2 retries left).FAILED - RETRYING: 轮询等待kubelet启动 (1 retries left).TASK [kube-node : 轮询等待kubelet启动] ************************************************************************************************************************************************************************fatal: [10.10.11.49]: FAILED! => {"attempts": 4, "changed": true, "cmd": "systemctl is-active kubelet.service", "delta": "0:00:00.014621", "end": "2023-03-22 15:42:07.230186", "msg": "non-zero return code", "rc": 3, "start": "2023-03-22 15:42:07.215565", "stderr": "", "stderr_lines": [], "stdout": "activating", "stdout_lines": ["activating"]}PLAY RECAP **********************************************************************************************************************************************************************************************10.10.11.49               : ok=85   changed=78   unreachable=0    failed=1    skipped=123  rescued=0    ignored=0localhost                  : ok=33   changed=30   unreachable=0    failed=0    skipped=11   rescued=0    ignored=0

kubelet阶段失败,查看kubelet服务:

[root@node01 log]# service kubelet status -lRedirecting to /bin/systemctl status  -l kubelet.service● kubelet.service - Kubernetes Kubelet   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)   Active: activating (auto-restart) (Result: exit-code) since Wed 2023-03-22 15:56:31 CST; 1s ago     Docs:   Process: 147581 ExecStart=/opt/kube/bin/kubelet --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --hostname-override=10.10.11.49 --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --root-dir=/var/lib/kubelet --v=2 (code=exited, status=1/FAILURE) Main PID: 147581 (code=exited, status=1/FAILURE)Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.719832  147581 manager.go:228] Version: {KernelVersion:3.10.0-862.11.6.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion: DockerAPIVersion: CadvisorVersion: CadvisorRevision:}Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.720896  147581 server.go:659] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.721939  147581 container_manager_linux.go:267] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722392  147581 container_manager_linux.go:272] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName:Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722503  147581 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722609  147581 container_manager_linux.go:308] "Creating device plugin manager"Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722689  147581 manager.go:125] "Creating Device Plugin manager" path="/var/lib/kubelet/device-plugins/kubelet.sock"Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722763  147581 server.go:66] "Creating device plugin registration server" version="v1beta1" socket="/var/lib/kubelet/device-plugins/kubelet.sock"Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722905  147581 state_mem.go:36] "Initialized new in-memory state store"Mar 22 15:56:31 node01 kubelet[147581]: E0322 15:56:31.726502  147581 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

根据日志报错,参考资料[3],删除 /etc/containerd/config.toml 文件并重启 containerd 即可:

mv /etc/containerd/config.toml /root/config.toml.baksystemctl restart containerd

重复执行命令,后台查看发现calico-node启动失败,查看日志如下:

Events:  Type     Reason     Age                From               Message  ----     ------     ----               ----               -------  Normal   Scheduled  41s                default-scheduler  Successfully assigned kube-system/calico-node-rqpjm to 10.10.11.49  Normal   Pulling    20s (x2 over 31s)  kubelet            Pulling image "easzlab.io.local:5000/calico/cni:v3.23.5"  Warning  Failed     19s (x2 over 31s)  kubelet            Failed to pull image "easzlab.io.local:5000/calico/cni:v3.23.5": rpc error: code = Unknown desc = failed to pull and unpack image "easzlab.io.local:5000/calico/cni:v3.23.5": failed to resolve reference "easzlab.io.local:5000/calico/cni:v3.23.5": failed to do request: Head ";: http: server gave HTTP response to HTTPS client  Warning  Failed     19s (x2 over 31s)  kubelet            Error: ErrImagePull  Normal   BackOff    5s (x2 over 30s)   kubelet            Back-off pulling image "easzlab.io.local:5000/calico/cni:v3.23.5"  Warning  Failed     5s (x2 over 30s)   kubelet            Error: ImagePullBackOff

查看docker层面配置,并测试拉起镜像正常:

[root@node01 ~]# cat /etc/docker/daemon.json{  "max-concurrent-downloads": 10,  "insecure-registries": ["easzlab.io.local:5000"],  "log-driver": "json-file",  "log-level": "warn",  "log-opts": {    "max-size": "10m",    "max-file": "3"    },  "data-root":"/var/lib/docker"}[root@node01 log]# docker pull easzlab.io.local:5000/calico/cni:v3.23.5v3.23.5: Pulling from calico/cniDigest: sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29Status: Image is up to date for easzlab.io.local:5000/calico/cni:v3.23.5easzlab.io.local:5000/calico/cni:v3.23.5

查看containerd层面,并测试拉起镜像也正常:

[root@node01 log]# ctr image pull --plain-http=true easzlab.io.local:5000/calico/cni:v3.23.5easzlab.io.local:5000/calico/cni:v3.23.5:                                         resolved       |++++++++++++++++++++++++++++++++++++++|manifest-sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29: done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:cc0e45adf05a30a90384ba7024dbabdad9ae0bcd7b5a535c28dede741298fea3:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:47c5dbbec31222325790ebad8c07d270a63689bd10dc8f54115c65db7c30ad1f:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:8efc3d73e2741a93be09f68c859da466f525b9d0bddb1cd2b2b633f14f232941:    done           |++++++++++++++++++++++++++++++++++++++|config-sha256:1c979d623de9aef043cb4ff489da5636d61c39e30676224af0055240e1816382:   done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:4c98a4f67c5a7b1058111d463051c98b23e46b75fc943fc2535899a73fc0c9f1:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:51729c6e2acda05a05e203289f5956954814d878f67feb1a03f9941ec5b4008b:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:050b055d5078c5c6ad085d106c232561b0c705aa2173edafd5e7a94a1e908fc5:    done           |++++++++++++++++++++++++++++++++++++++|layer-sha256:7430548aa23e56c14da929bbe5e9a2af0f9fd0beca3bd95e8925244058b83748:    done           |++++++++++++++++++++++++++++++++++++++|elapsed: 3.1 s                                                                    total:  103.0  (33.2 MiB/s)unpacking linux/amd64 sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29...done: 6.82968396s

根据资料[4],查看containerd配置,并新增私有仓库的配置:

[root@node01 ~]# containerd config default > /etc/containerd/config.toml[root@node01 ~]# vim  /etc/containerd/config.toml[plugins."io.containerd.grpc.v1.cri".registry]      config_path = ""      [plugins."io.containerd.grpc.v1.cri".registry.auths]      [plugins."io.containerd.grpc.v1.cri".registry.configs]      [plugins."io.containerd.grpc.v1.cri".registry.headers]      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."easzlab.io.local:5000"]          endpoint = [";][root@node01 ~]# service containerd restart

查看pod状态,又卡在了ContainerCreating状态:

[root@node01 ~]# kubectl get pod -ANAMESPACE     NAME                                         READY   STATUS              RESTARTS   AGEkube-system   calico-kube-controllers-89b744d6c-klzwh      1/1     Running             0          5m35skube-system   calico-node-wmvff                            1/1     Running             0          5m35skube-system   coredns-6665999d97-mp7xc                     0/1     ContainerCreating   0          5m35skube-system   dashboard-metrics-scraper-57566685b4-8q5fm   0/1     ContainerCreating   0          5m35skube-system   kubernetes-dashboard-57db9bfd5b-h6jp4        0/1     ContainerCreating   0          5m35skube-system   metrics-server-6bd9f986fc-njpnj              0/1     ContainerCreating   0          5m35skube-system   node-local-dns-wz9bg                         1/1     Running             0          5m31s

选择一个describe查看:

Events:  Type     Reason                  Age                   From               Message  ----     ------                  ----                  ----               -------  Warning  FailedScheduling        6m7s                  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..  Normal   Scheduled               5m47s                 default-scheduler  Successfully assigned kube-system/coredns-6665999d97-mp7xc to 10.10.11.49  Warning  FailedCreatePodSandBox  5m46s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "072c164d79f4874a8d851d36115ea04b75a2155dae3cecdc764e923c9f38f86b": plugin type="calico" failed (add): failed to find plugin "calico" in path [/opt/cni/bin]  Normal   SandboxChanged          33s (x25 over 5m46s)  kubelet            Pod sandbox changed, it will be killed and re-created.

从日志看,是cni插件不存在的问题,手动拷贝之后,查看pod状态:

[root@node01 bin]# cd /opt/cni/bin/[root@node01 bin]# chmod +x *[root@node01 bin]# ll -htotal 186M-rwxr-xr-x 1 root root 3.7M Mar 22 17:46 bandwidth-rwxr-xr-x 1 root root  56M Mar 22 17:46 calico-rwxr-xr-x 1 root root  56M Mar 22 17:46 calico-ipam-rwxr-xr-x 1 root root 2.4M Mar 22 17:46 flannel-rwxr-xr-x 1 root root 3.1M Mar 22 17:46 host-local-rwxr-xr-x 1 root root  56M Mar 22 17:46 install-rwxr-xr-x 1 root root 3.2M Mar 22 17:46 loopback-rwxr-xr-x 1 root root 3.6M Mar 22 17:46 portmap-rwxr-xr-x 1 root root 3.3M Mar 22 17:46 tuning[root@node01 bin]# kubectl get pod -ANAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGEkube-system   calico-kube-controllers-89b744d6c-mpfgq      1/1     Running   0          37mkube-system   calico-node-h9sm2                            1/1     Running   0          37mkube-system   coredns-6665999d97-8pdbd                     1/1     Running   0          37mkube-system   dashboard-metrics-scraper-57566685b4-c2l8w   1/1     Running   0          37mkube-system   kubernetes-dashboard-57db9bfd5b-74lmb        1/1     Running   0          37mkube-system   metrics-server-6bd9f986fc-d9crl              1/1     Running   0          37mkube-system   node-local-dns-kvgv6                         1/1     Running   0          37m

部署完成。

参考资料

标签: #windows10安装nodejs出现2503