龙空技术网

Centos7 RKE部署高可用k8s集群

DIY日常记录 140

前言:

而今兄弟们对“centos高可用集群”可能比较重视,你们都想要学习一些“centos高可用集群”的相关资讯。那么小编在网上收集了一些关于“centos高可用集群””的相关文章,希望看官们能喜欢,姐妹们一起来学习一下吧!

一、RKE介绍

  1、介绍:RKE是经过CNCF认证的Kubernetes发行版,并且全部组件完全在Docker容器内运行

Rancher Server只能在使用RKE或K3s安装的Kubernetes集群中运行

  2、节点环境准备

firewall-cmd --permanent --add-port=22/tcpfirewall-cmd --permanent --add-port=80/tcpfirewall-cmd --permanent --add-port=443/tcpfirewall-cmd --permanent --add-port=30000-32767/tcpfirewall-cmd --permanent --add-port=30000-32767/udpfirewall-cmd --reload

  2、同步节点时间

yum install ntpdate -yntpdate time.windows.com

3、安装docker

任何运行Rancher Server的节点上都需要安装Docker

sudo  yum install -y yum-utils device-mapper-persistent-data lvm2sudo  yum-config-manager --add-repo     yum install  docker-ce-18.09.3-3.el7

  4、安装kubectl

cat <<EOF > /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=  yum install kubectl

  5、安装RKE

  Rancher Kubernetes Engine用于构建Kubernetes集群的CLI

  下载地址:

mv rke_linux-amd64 rkechmod +x rkemv rke /usr/local/binrke

  6、安装Helm

  Kubernetes的软件包管理工具

  下载地址:

tar -zxvf helm-v3.3.1-linux-amd64.tar.gzcd linux-amd64mv helm /usr/local/binchown -R admin:admin /usr/local/bin/helmhelm version

  7、配置ssh免密连接

  切换成admin用户,在执行rke up命令的主机上执行创建ssh公私钥 并把公钥分发到各个节点上

ssh-keygen -t rsassh-copy-id 192.168.112.120ssh-copy-id 192.168.112.121

  8、配置操作系统参数支持k8s集群 (所有节点上都要执行)

sudo swapoff -asudo vi /etc/sysctl.confnet.ipv4.ip_forward = 1net.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1sudo sysctl -p

  9、使用rke创建集群初始化配置文件 

  rke config --name  cluster.yml

  RKE使用一个名为cluster.yml确定如何在集群中的节点上部署Kubernetes

# If you intened to deploy Kubernetes in an air-gapped environment,# please consult the documentation on how to configure custom RKE images.nodes:- address: "192.168.30.110"  port: "22"  internal_address: ""  role: [controlplane,etcd,worker]  hostname_override: "node1"  user: admin  docker_socket: /var/run/docker.sock  ssh_key: ""  ssh_key_path: ~/.ssh/id_rsa  ssh_cert: ""  ssh_cert_path: ""  labels: {}  taints: []- address: "192.168.30.129"  port: "22"  internal_address: ""  role: [controlplane,etcd,worker]  hostname_override: "node2"  user: admin  docker_socket: /var/run/docker.sock  ssh_key: ""  ssh_key_path: ~/.ssh/id_rsa  ssh_cert: ""  ssh_cert_path: ""  labels: {}  taints: []- address: "192.168.30.133"  port: "22"  internal_address: ""  role: [controlplane,etcd,worker]  hostname_override: "node3"  user: admin  docker_socket: /var/run/docker.sock  ssh_key: ""  ssh_key_path: ~/.ssh/id_rsa  ssh_cert: ""  ssh_cert_path: ""  labels: {}  taints: []services:  etcd:    image: ""    extra_args: {}    extra_binds: []    extra_env: []    external_urls: []    ca_cert: ""    cert: ""    key: ""    path: ""    uid: 0    gid: 0    snapshot: null    retention: ""    creation: ""    backup_config: null  kube-api:    image: ""    extra_args: {}    extra_binds: []    extra_env: []    service_cluster_ip_range: 10.43.0.0/16    service_node_port_range: ""    pod_security_policy: false    always_pull_images: false    secrets_encryption_config: null    audit_log: null    admission_configuration: null    event_rate_limit: null  kube-controller:    image: ""    extra_args: {}    extra_binds: []    extra_env: []    cluster_cidr: 10.42.0.0/16    service_cluster_ip_range: 10.43.0.0/16  scheduler:    image: ""    extra_args: {}    extra_binds: []    extra_env: []  kubelet:    image: ""    extra_args: {}    extra_binds: []    extra_env: []    cluster_domain: cluster.local    infra_container_image: ""    cluster_dns_server: 10.43.0.10    fail_swap_on: false    generate_serving_certificate: false  kubeproxy:    image: ""    extra_args: {}    extra_binds: []    extra_env: []network:  plugin: flannel  options: {}  mtu: 0  node_selector: {}  update_strategy: nullauthentication:  strategy: x509  sans: []  webhook: nulladdons: ""addons_include: []system_images:  etcd: rancher/coreos-etcd:v3.4.3-rancher1  alpine: rancher/rke-tools:v0.1.58  nginx_proxy: rancher/rke-tools:v0.1.58  cert_downloader: rancher/rke-tools:v0.1.58  kubernetes_services_sidecar: rancher/rke-tools:v0.1.58  kubedns: rancher/k8s-dns-kube-dns:1.15.2  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.2  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.2  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1  coredns: rancher/coredns-coredns:1.6.9  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1  nodelocal: rancher/k8s-dns-node-cache:1.15.7  kubernetes: rancher/hyperkube:v1.18.3-rancher2  flannel: rancher/coreos-flannel:v0.12.0  flannel_cni: rancher/flannel-cni:v0.3.0-rancher6  calico_node: rancher/calico-node:v3.13.4  calico_cni: rancher/calico-cni:v3.13.4  calico_controllers: rancher/calico-kube-controllers:v3.13.4  calico_ctl: rancher/calico-ctl:v3.13.4  calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4  canal_node: rancher/calico-node:v3.13.4  canal_cni: rancher/calico-cni:v3.13.4  canal_flannel: rancher/coreos-flannel:v0.12.0  canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4  weave_node: weaveworks/weave-kube:2.6.4  weave_cni: weaveworks/weave-npc:2.6.4  pod_infra_container: rancher/pause:3.1  ingress: rancher/nginx-ingress-controller:nginx-0.32.0-rancher1  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1  metrics_server: rancher/metrics-server:v0.3.6  windows_pod_infra_container: rancher/kubelet-pause:v0.1.4ssh_key_path: ~/.ssh/id_rsassh_cert_path: ""ssh_agent_auth: falseauthorization:  mode: rbac  options: {}ignore_docker_version: nullkubernetes_version: ""private_registries: []ingress:  provider: ""  options: {}  node_selector: {}  extra_args: {}  dns_policy: ""  extra_envs: []  extra_volumes: []  extra_volume_mounts: []  update_strategy: nullcluster_name: ""cloud_provider:  name: ""prefix_path: ""addon_job_timeout: 0bastion_host:  address: ""  port: ""  user: ""  ssh_key: ""  ssh_key_path: ""  ssh_cert: ""  ssh_cert_path: ""monitoring:  provider: ""  options: {}  node_selector: {}  update_strategy: null  replicas: nullrestore:  restore: false  snapshot_name: ""dns: nullcluster.yaml

  10、部署

rke up

  11、设置环境变量

export KUBECONFIG=/home/admin/kube_config_cluster.ymlmkdir ~/.kubecp kube_config_rancher-cluster.yml ~/.kube/config

  通过RKE安装k8s集群成功,启动的时候有些节点启动的比较慢。需要稍微等待一段时间.

二、RKE的环境清理

  1、rancher-node-1,2,3中分别执行以下命令

mkdir ranchercat > rancher/clear.sh << EOFdf -h|grep kubelet |awk -F % '{print $2}'|xargs umount rm /var/lib/kubelet/* -rfrm /etc/kubernetes/* -rfrm /var/lib/rancher/* -rfrm /var/lib/etcd/* -rfrm /var/lib/cni/* -rfrm -rf /var/run/calico iptables -F && iptables -t nat -Fip link del flannel.1docker ps -a|awk '{print $1}'|xargs docker rm -fdocker volume ls|awk '{print $2}'|xargs docker volume rmrm -rf /var/etcd/rm -rf /run/kubernetes/docker rm -fv $(docker ps -aq)docker volume rm  $(docker volume ls)rm -rf /etc/cnirm -rf /opt/cnisystemctl restart dockerEOFsh rancher/clear.sh清理脚本

  2、清理残留目录结束。如果还有问题可能需要卸载所有节点上的docker

首先查看Docker版本# yum list installed | grep dockerdocker-ce.x86_64  18.05.0.ce-3.el7.centos @docker-ce-edge 执行卸载# yum -y remove docker-ce.x86_64 删除存储目录 # rm -rf /etc/docker# rm -rf /run/docker# rm -rf /var/lib/dockershim# rm -rf /var/lib/docker如果发现删除不掉,需要先 umount,如# umount /var/lib/docker/devicemapper卸载docker

  rke up --config=./rancher-cluster.yml rke启动执行是幂等操作的 有时候需要多次执行才能成功

  3、rke多次安装和卸载k8s集群问题

  启动的时候提示ectd的集群健康检查失败

  清空节点上所有k8s的相关目录。卸载和删除docker所有相关目录。重新安装docker

最后在执行rke启动命令

三、扩容节点,缩容节点

  1、添加节点  

  修改cluster.yal 将需要添加的节点配置,然后运行

more cluster.ymlnodes:  - address: 172.20.101.103    user: ptmind    role: [controlplane,worker,etcd]  - address: 172.20.101.104    user: ptmind    role: [controlplane,worker,etcd]  - address: 172.20.101.105    user: ptmind    role: [controlplane,worker,etcd]  - address: 172.20.101.106    user: ptmind    role: [worker]    labels: {traefik: traefik-outer}

  2、执行添加节点操作

rke up --update-only

  3、rke 删除节点

  修改cluster.yal 将需要删除的节点配置删除,然后运行

more cluster.ymlnodes:  - address: 172.20.101.103    user: ptmind    role: [controlplane,worker,etcd]  - address: 172.20.101.104    user: ptmind    role: [controlplane,worker,etcd]  - address: 172.20.101.105    user: ptmind    role: [controlplane,worker,etcd]删除#  - address: 172.20.101.106删除#    user: ptmind删除#    role: [worker]删除#    labels: {traefik: traefik-outer}

  4、执行删除节点操作

rke up --update-only

  问题:当node节点处于NotReady状态下,对节点不可做操作,比如做了删除节点操作,会报错,删除不了节点。

  解决办法:

    1、手动删除节点上的组件。

    2、通过命令移除节点的角色

kubectl label node prod-129 node-role.kubernetes.io/controlplane-

  问题:k8s 集群的节点处于 SchedulingDisabled

  解决方法:

kubectl patch node NodeName -p "{\"spec\":{\"unschedulable\":false}}"

  或者:

设置不可调度kubectl cordon node07-ingress 取消节点不可调度kubectl uncordon node07-ingress驱逐节点的podkubectl drain --ignore-daemonsets --delete-local-data node07-ingress删除节点kubectl delete node node07-ingress

标签: #centos高可用集群 #nginxcoreos