龙空技术网

实践:Kubernetes环境中Etcd集群的备份与恢复

不秃头程序员 295

前言:

目前咱们对“ubuntu安装mwget”都比较讲究,看官们都需要了解一些“ubuntu安装mwget”的相关内容。那么小编在网上收集了一些对于“ubuntu安装mwget””的相关文章,希望我们能喜欢,我们一起来学习一下吧!

这篇文章我们将进行Kubernetes集群的核心组件 etcd 集群备份,然后在具有一个主节点和一个从节点的 kubernetes 集群中恢复相同的备份。下面是实验的步骤和效果验证。

Step1 安装ETCD客户端

安装etcd cli 客户端, 管理etcd集群。这里在Ubuntu系统中安装。

apt install etcd-client
Step2 创建Nginx部署

我们将创建具有多个副本的 nginx 部署,这些副本将用于验证 etcd 数据的恢复。

kubectl create deployment nginx — image nginx --replicas=5

验证新部署的 Pod 是否处于运行状态

controlplane $ kubectl get podsNAME                     READY   STATUS    RESTARTS   AGEnginx-77b4fdf86c-6m8gl   1/1     Running   0          50snginx-77b4fdf86c-bfcsr   1/1     Running   0          50snginx-77b4fdf86c-bqmqk   1/1     Running   0          50snginx-77b4fdf86c-nkh7j   1/1     Running   0          50snginx-77b4fdf86c-x946x   1/1     Running   0          50s
Step3 备份Etcd集群

为 etcd 备份创建一个备份目录mkdir etcd-backup运行以下命令进行 etcd 备份。

ETCDCTL_API=3 etcdctl --endpoints= \                      --cacert=/etc/kubernetes/pki/etcd/ca.crt \                      --cert=/etc/kubernetes/pki/etcd/server.crt \                      --key=/etc/kubernetes/pki/etcd/server.key \snapshot save ./etcd-backup/etcdbackup.db

请注意,您不需要记住上述命令的证书路径,您可以从 kube-system 命名空间中运行的 etcd pod 获取证书路径。您可以通过运行以下命令来为 pod 运行命令

controlplane $ kubectl get pods -n kube-systemNAME                                       READY   STATUS    RESTARTS      AGEcalico-kube-controllers-784cc4bcb7-xk6q7   1/1     Running   4             38dcanal-9nszc                                2/2     Running   0             42mcanal-brzd7                                2/2     Running   0             42mcoredns-5d769bfcf4-5mwkn                   1/1     Running   0             38dcoredns-5d769bfcf4-w4xs7                   1/1     Running   0             38detcd-controlplane                          1/1     Running   0             38dkube-apiserver-controlplane                1/1     Running   2             38dkube-controller-manager-controlplane       1/1     Running   3 (41m ago)   38dkube-proxy-5b8sx                           1/1     Running   0             38dkube-proxy-5qlc5                           1/1     Running   0             38dkube-scheduler-controlplane                1/1     Running   3 (41m ago)   38d

现在运行 get pods -o yaml 命令来获取 etcd pod 的容器命令。

kubectl get pods etcd-controlplane -o yaml -n kube-system

将得到它并可以获得所有证书路径。

containers:  - command:    - etcd    - --advertise-client-urls=    - --cert-file=/etc/kubernetes/pki/etcd/server.crt    - --client-cert-auth=true    - --data-dir=/var/lib/etcd    - --experimental-initial-corrupt-check=true    - --experimental-watch-progress-notify-interval=5s    - --initial-advertise-peer-urls=    - --initial-cluster=controlplane=    - --key-file=/etc/kubernetes/pki/etcd/server.key    - --listen-client-urls=    - --listen-metrics-urls=    - --listen-peer-urls=    - --name=controlplane    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt    - --peer-client-cert-auth=true    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt    - --snapshot-count=10000    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
Step4 验证备份数据

运行以下命令,以从新备份数据中获取密钥列表和详细信息ETCDCTL_API=3 etcdctl --write-out=table snapshot status ./etcd-backup/etcdbackup.db

controlplane $ ETCDCTL_API=3 etcdctl --write-out=table snapshot status ./etcd-backup/etcdbackup.db +---------+----------+------------+------------+|  HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+---------+----------+------------+------------+| cb4c04c |     4567 |       1346 |     6.0 MB |+---------+----------+------------+------------+
Step5 将备份恢复到集群

在这里,我们将删除之前创建的 nginx 部署,然后恢复备份,以便恢复 nginx 部署。

A.删除nginx部署

controlplane $ kubectl delete deploy nginxdeployment.apps "nginx" deleted

B.将数据从备份恢复

ETCDCTL_API=3 etcdctl snapshot restore etcd-backup/etcdbackup.db

这将创建一个名为的default.etcd文件夹, 恢复备份时您可能会遇到如下错误:

controlplane $ ETCDCTL_API=3 etcdctl snapshot restore etcd-backup/etcdbackup.db Error:  expected sha256 [253 81 3 207 182 43 249 52 218 166 71 135 221 106 6 216 216 21 183 250 36 126 187 251 171 98 91 69 113 40 229 2], got [63 25 34 167 139 91 18 135 249 179 157 115 214 138 237 35 161 237 175 12 61 31 141 130 204 146 143 177 132 241 193 15]

为了避免这种情况,您可以在上面的恢复命令中使用--skip-hash-check=true此标志,您应该可以很好地获取default.etcd当前路径上的文件夹。

controlplane $ ETCDCTL_API=3 etcdctl snapshot restore etcd-backup/etcdbackup.db --skip-hash-check=true2023-06-28 15:35:36.180956 I | etcdserver/membership: added member 8e9e05c52164694d [] to cluster cdf818194e3a8c32controlplane $ lsdefault.etcd  etcd-backup  filesystem

C.现在我们需要停止所有正在运行的 Kubernetes 组件以更新 etcd 数据。为此,我们在/etc/kubernetes/manifests/文件夹中放置了 kubernetes 组件的清单文件,我们将临时将此文件移出此路径,kubelet 将自动删除这些 pod。

controlplane $ ls /etc/kubernetes/manifests/etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yamlcontrolplane $ kubectl get pods -n kube-systemNAME                                       READY   STATUS    RESTARTS   AGEcalico-kube-controllers-784cc4bcb7-xk6q7   1/1     Running   4          38dcanal-5lxjg                                2/2     Running   0          28mcanal-zv77t                                2/2     Running   0          28mcoredns-5d769bfcf4-5mwkn                   1/1     Running   0          38dcoredns-5d769bfcf4-w4xs7                   1/1     Running   0          38detcd-controlplane                          1/1     Running   0          38dkube-apiserver-controlplane                1/1     Running   2          38dkube-controller-manager-controlplane       1/1     Running   2          38dkube-proxy-5b8sx                           1/1     Running   0          38dkube-proxy-5qlc5                           1/1     Running   0          38dkube-scheduler-controlplane                1/1     Running   2          38dcontrolplane $ mkdir temp_yaml_filescontrolplane $ mv /etc/kubernetes/manifests/* temp_yaml_files/controlplane $ kubectl get pods -n kube-systemThe connection to the server 172.30.1.2:6443 was refused - did you specify the right host or port?

您可以在上面看到,一旦我们从清单路径中删除文件,api-server pod 将被终止,您将无法访问集群。你可以检查这些组件的docker容器是否被Kill或处于运行状态。在移动文件之前,容器将运行。

controlplane $ crictl psCONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD6a2bce359c15b       6f6e73fa8162b       3 seconds ago       Running             kube-apiserver            0                   fe1be6aa651dd       kube-apiserver-controlplanea26534b2e6244       c6b5118178229       4 seconds ago       Running             kube-controller-manager   0                   38fb48a4ebb62       kube-controller-manager-controlplane58ac164968ec3       86b6af7dd652c       4 seconds ago       Running             etcd                      0                   170af0e603a02       etcd-controlplanee98ef4185206b       6468fa8f98696       4 seconds ago       Running             kube-scheduler            0                   0bd26fd661a2c       kube-scheduler-controlplane7a03436be6ce6       f9c3c1813269c       23 seconds ago      Running             calico-kube-controllers   7                   6da32eed5e939       calico-kube-controllers-784cc4bcb7-xk6q71edf2a857f1d4       e6ea68648f0cd       31 minutes ago      Running             kube-flannel              0                   3dac4c0c5960d       canal-5lxjge249d3e4b2b51       75392e3500e36       31 minutes ago      Running             calico-node               0                   3dac4c0c5960d       canal-5lxjg039999604ba8c       ead0a4a53df89       5 weeks ago         Running             coredns                   0                   f8b31a08b4907       coredns-5d769bfcf4-5mwkn26d7a0bc1b1b9       1780fa6665ff0       5 weeks ago         Running             local-path-provisioner    0                   1913e8d9cb757       local-path-provisioner-bf548cc96-fchvwc86359e6bf649       fbe39e5d66b6a       5 weeks ago         Running   

一旦文件被移动,它们将被终止。

controlplane $ mv /etc/kubernetes/manifests/* temp_yaml_files/controlplane $ crictl psCONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD7a03436be6ce6       f9c3c1813269c       2 minutes ago       Running             calico-kube-controllers   7                   6da32eed5e939       calico-kube-controllers-784cc4bcb7-xk6q71edf2a857f1d4       e6ea68648f0cd       34 minutes ago      Running             kube-flannel              0                   3dac4c0c5960d       canal-5lxjge249d3e4b2b51       75392e3500e36       34 minutes ago      Running             calico-node               0                   3dac4c0c5960d       canal-5lxjg039999604ba8c       ead0a4a53df89       5 weeks ago         Running             coredns                   0                   f8b31a08b4907       coredns-5d769bfcf4-5mwkn26d7a0bc1b1b9       1780fa6665ff0       5 weeks ago         Running             local-path-provisioner    0                   1913e8d9cb757       local-path-provisioner-bf548cc96-fchvwc86359e6bf649       fbe39e5d66b6a       5 weeks ago         Running             kube-proxy                0                   d69f1cd083173       kube-proxy-5b8sx

D.现在 api-server/controller-manager/kube-scheduler 已终止,我们将把数据从default.etcd文件夹移动到 etcd data-dir,我们可以从第 3 阶段获取该数据,在阶段 3 中,我们在 etcd pod 中运行 etcd 命令,并且设置了 data-dir到--data-dir=/var/lib/etcd.

controlplane $ cd default.etcd/controlplane $ lsmembercontrolplane $ ls /var/lib/etcdmember

我们将从备份目录中重命名并添加member文件夹/var/lib/etcd/member。备份默认/var/lib/etcd/目录中的member 到文件夹/var/lib/etcd/member.bak

controlplane $ cd default.etcd/controlplane $ lsmembercontrolplane $ mv /var/lib/etcd/member/ /var/lib/etcd/member.bakcontrolplane $ mv  member/ /var/lib/etcd/controlplane $ ls /var/lib/etcdmember  member.bak

E. 现在,由于我们的数据已恢复,我们将停止 kubelet 服务并将 yaml 文件再次移动到清单文件夹。

controlplane $ systemctl stop kubeletcontrolplane $ systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)    Drop-In: /etc/systemd/system/kubelet.service.d             └─10-kubeadm.conf     Active: inactive (dead) since Wed 2023-06-28 16:03:32 UTC; 6s ago       Docs:     Process: 25011 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, stat>   Main PID: 25011 (code=exited, status=0/SUCCESS)Jun 28 16:03:30 controlplane kubelet[25011]: E0628 16:03:30.524978   25011 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"htt>Jun 28 16:03:31 controlplane kubelet[25011]: I0628 16:03:31.195933   25011 status_manager.go:809] "Failed to get status for pod" podUID=4ad6dc12-6828-45>Jun 28 16:03:31 controlplane kubelet[25011]: E0628 16:03:31.196843   25011 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \";Jun 28 16:03:31 controlplane kubelet[25011]: E0628 16:03:31.197110   25011 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \";Jun 28 16:03:31 controlplane kubelet[25011]: E0628 16:03:31.197392   25011 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \";Jun 28 16:03:31 controlplane kubelet[25011]: E0628 16:03:31.197721   25011 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \";Jun 28 16:03:32 controlplane systemd[1]: Stopping kubelet: The Kubernetes Node Agent...Jun 28 16:03:32 controlplane kubelet[25011]: I0628 16:03:32.098579   25011 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bun>Jun 28 16:03:32 controlplane systemd[1]: kubelet.service: Succeeded.Jun 28 16:03:32 controlplane systemd[1]: Stopped kubelet: The Kubernetes Node Agent.lines 1-19/19 (END)controlplane $ mv temp_yaml_files/* /etc/kubernetes/manifests/controlplane $ ls /etc/kubernetes/manifests/etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml

一旦这些文件被移动,我们将启动 kubelet 服务,以便它选择这些文件并部署组件。

controlplane $ systemctl start kubeletcontrolplane $ systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)    Drop-In: /etc/systemd/system/kubelet.service.d             └─10-kubeadm.conf     Active: active (running) since Wed 2023-06-28 16:05:56 UTC; 3s ago       Docs:    Main PID: 60741 (kubelet)      Tasks: 9 (limit: 2339)     Memory: 70.5M     CGroup: /system.slice/kubelet.service             └─60741 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/>Jun 28 16:05:57 controlplane kubelet[60741]: W0628 16:05:57.729886   60741 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to>Jun 28 16:05:57 controlplane kubelet[60741]: E0628 16:05:57.729952   60741 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to>Jun 28 16:05:57 controlplane kubelet[60741]: W0628 16:05:57.831598   60741 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to>Jun 28 16:05:57 controlplane kubelet[60741]: E0628 16:05:57.832204   60741 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to>Jun 28 16:05:58 controlplane kubelet[60741]: W0628 16:05:58.130322   60741 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to>Jun 28 16:05:58 controlplane kubelet[60741]: E0628 16:05:58.130397   60741 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to>Jun 28 16:05:58 controlplane kubelet[60741]: E0628 16:05:58.274435   60741 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"htt>Jun 28 16:05:58 controlplane kubelet[60741]: I0628 16:05:58.360755   60741 kubelet_node_status.go:70] "Attempting to register node" node="controlplane"Jun 28 16:05:58 controlplane kubelet[60741]: E0628 16:05:58.361160   60741 kubelet_node_status.go:92] "Unable to register node with API server" err="Pos>Jun 28 16:05:59 controlplane kubelet[60741]: I0628 16:05:59.962674   60741 kubelet_node_status.go:70] "Attempting to register node" node="controlplane"lines 1-22/22 (END)

您现在可以看到容器现在再次运行, kubectl 命令可能需要几分钟才能工作。

crictl psCONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD688cfa2890b4f       f9c3c1813269c       23 seconds ago      Running             calico-kube-controllers   12                  6da32eed5e939       calico-kube-controllers-784cc4bcb7-xk6q7db1797e3e2e83       6468fa8f98696       28 seconds ago      Running             kube-scheduler            0                   307a1600b4346       kube-scheduler-controlplane1dc176c2a599e       c6b5118178229       28 seconds ago      Running             kube-controller-manager   0                   f9efc6c4c8d91       kube-controller-manager-controlplanef70e2103ec1e0       6f6e73fa8162b       29 seconds ago      Running             kube-apiserver            0                   32f49c141ea69       kube-apiserver-controlplane2e274f5176656       86b6af7dd652c       29 seconds ago      Running             etcd                      0                   9c561113f9fcd       etcd-controlplane1edf2a857f1d4       e6ea68648f0cd       47 minutes ago      Running             kube-flannel              0                   3dac4c0c5960d       canal-5lxjge249d3e4b2b51       75392e3500e36       47 minutes ago      Running             calico-node               0                   3dac4c0c5960d       canal-5lxjg039999604ba8c       ead0a4a53df89       5 weeks ago         Running             coredns                   0                   f8b31a08b4907       coredns-5d769bfcf4-5mwkn26d7a0bc1b1b9       1780fa6665ff0       5 weeks ago         Running             local-path-provisioner    0                   1913e8d9cb757       local-path-provisioner-bf548cc96-fchvwc86359e6bf649       fbe39e5d66b6a       5 weeks ago         Running 

您现在可以通过运行 get pods 命令来验证我们的 nginx 部署是否已恢复(我们在备份后删除了该部署)

controlplane $ kubectl get podsNAME                     READY   STATUS    RESTARTS   AGEnginx-77b4fdf86c-8n7kg   1/1     Running   0          40mnginx-77b4fdf86c-gmbjm   1/1     Running   0          40mnginx-77b4fdf86c-pjpnr   1/1     Running   0          40mnginx-77b4fdf86c-qjxmd   1/1     Running   0          40mnginx-77b4fdf86c-zhvnv   1/1     Running   0          40m

恭喜!!!您现在已成功恢复 ETCD 数据。

标签: #ubuntu安装mwget