前言:
现在我们对“ubuntu更新uvc”大概比较看重,大家都想要分析一些“ubuntu更新uvc”的相关内容。那么小编在网上汇集了一些对于“ubuntu更新uvc””的相关知识,希望咱们能喜欢,朋友们一起来了解一下吧!背景
k8s版本1.25.6,业务k8s容器化,虚机里进程迁移到容器里后,运维在执行free -m top等命令排查问题时一脸迷惑,显示内存还有很多结果pod的容器被oom或CPU资源显示很多核且空闲很多资源进程却运行很慢,我们看到的资源视图是物理机的而非我们做了限定pod里容器的资源,这给研发和运维排查问题带来一定的干扰。
那是什么原因导致运维看到的资源视图还是物理机的呢?
我们知道容器通过cgroup对CPU、内存、交换空间等资源进行限制,但是容器并不是完全独立隔离的,它与主机共享内核,因此可以访问主机上的一些信息。在Linux系统中,/proc目录下存放了许多虚拟文件,它们提供了对系统内核和运行时信息的访问。/proc/meminfo文件包含了关于内存使用和状态的信息,例如总内存大小、可用内存、已使用内存等。当在容器里执行free -m时,实际上是在访问主机上的/proc/meminfo文件的信息,所以展示的是物理机的内存信息。
我们知道什么原因导致的容器资源视图没有隔离的问题,在实际的使用过程中除了有迷惑还会有一些痛点:
1. 比如nginx 根据CPU核数自动设置worker数量。2. jvm程序内存根据系统内存大小自动设置jvm大小,导致进程启动不了或者运行过程中经常oom。3. 信息的过度泄露可能会危害物理机的安全等。
那怎么解决容器资源视图隔离的问题? Linux容器(LXC)社区早就意识到上述问题,他们开发了LXCFS(Linux Containers File System)来解决容器资源视图隔离的问题。
下面来看看LXCFS的工作原理。
LXCFS工作原理
LXCFS是一个使用FUSE(Filesystem in Userspace)实现的小型虚拟文件系统,旨在让Linux容器感觉更像一个虚拟机。它最初是LXC的一个附带项目,但可由任何运行时使用。
LXCFS确保procfs中关键文件提供的信息是针对容器的,例如:
/proc/cpuinfo/proc/diskstats/proc/meminfo/proc/stat/proc/swaps/proc/uptime/proc/slabinfo/sys/devices/system/cpu/online
LXCFS将这些信息适配到容器内,以便显示的值(例如/proc/uptime)真正反映容器的运行时间,而不是主机的运行时间。
即LXCFS在容器内部创建了一个虚拟的文件系统,通过挂载主机上的一些关键目录(如/proc和/sys等)到容器内部的对应目录下,使得容器内的进程可以看到主机上的资源信息,同时,LXCFS通过自己的逻辑和计算,提供了对这些资源信息的虚拟视图,使得容器内部能够看到主机上实际的资源使用情况。
1. 容器里执行free -m,读取文件/proc/meminfo2. 因为/proc/meminfo文 件是挂载的,所以会读取/var/lib/lxcfs/proc/meminfo文件见下文,这就触发了LXCFS的工作机制3. LXCFS文件系通过gblic系统调用vfs接口然后转向Fuse内核模块4. FUSE回调用户空间LXCFS文件系统实现接口,获取容器的cgroup信息5. LXCFS实现根据容器id获取并计算cgroup下被限制容器的实际mem、cpu等信息,最终返回给用户看到的结果就是cgroup 限制的资源视图。LXCFS机器上部署a. 安装lxcfs
yum install meson fuse-devel fuse cmake help2man fuse3 fuse3-devel -ygit clone git://github.com/lxc/lxcfscd lxcfsmeson setup -Dinit-script=systemd --prefix=/usr build/meson compile -C build/meson install -C build/b. 启动lxcfs
mkdir -p /var/lib/lxcfslxcfs /var/lib/lxcfsc. 测试运行容器
docker run -it -m 256m --memory-swap 256m --cpus=1 \ -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \ -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \ -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \ -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \ -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \ -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \ -v /var/lib/lxcfs/proc/slabinfo:/proc/slabinfo:rw \ -v /var/lib/lxcfs/sys/devices/system/cpu:/sys/devices/system/cpu:rw \ ubuntu:18.04 /bin/bash
启动容器后,执行如下命令确认是否生效
1. uptime #容器启动时间2. free -m #内存情况3. lscpu #看online cpu 核数 或者 cat /proc/cpuinfo
k8s 环境下怎么为pod加上资源视图隔离呢?下面我们来看一看
LXCFSk8s 环境运行
解决步骤:
1. 首先要使lxcfs进程在所有的node上运行,这个我们使用damonset解决2. 其次挂载node上的/sys/fs/cgroup、/usr/lib64和/usr/local 到lxcfs里,把lxcfs 容器里虚拟文件系统/var/lib/lxcfs/通过hostPath挂载到物理机上3. 最后创建podyaml,通过hostPath形式把node上/var/lib/lxcfs/ 挂载到pod的容器里,这样就完成了lxcfs 解决k8s 容器资源视图隔离的问题。a. 构建lxcfs镜像a.1 目录结构
tree . .├── Dockerfile├── build.sh└── lxcfs-lxcfs-5.0.4.tar.gza.2 Dockerfile
FROM centos:7.9 #或者制定你的基础镜像#安装RUN yum install meson fuse-devel fuse cmake help2man fuse3 fuse3-devel git -yRUN git clone git://github.com/lxc/lxcfs && cd lxcfsRUN meson setup -Dinit-script=systemd --prefix=/usr build/RUN meson compile -C build/RUN meson install -C build/#运行RUN mkdir -p /var/lib/lxcfsCMD ["sh", "-c", "lxcfs /var/lib/lxcfs"]a.3 build.sh 构建镜像
#!/bin/bashsource /etc/profiledocker build -t yourharbor.domain.com/centos/7.9/lxcfs/5.0.4/lxcfs .docker push yourharbor.domain.com/centos/7.9/lxcfs/5.0.4/lxcfs
到这里lxcfs镜像就构建完了,下面看看怎么用此镜像
b. 运行lxcfsdaemonsetyaml
使用构建的lxcfs镜像,挂载node文件到pod同时挂载/var/lib/lxcfs/ 到node上,见下述yaml
apiVersion: apps/v1kind: DaemonSetmetadata: annotations: labels: app: lxcfs name: lxcfs namespace: defaultspec: revisionHistoryLimit: 10 selector: matchLabels: app: lxcfs template: metadata: labels: app: lxcfs spec: containers: - yourharbor.domain.com/centos/7.9/lxcfs/5.0.4/lxcfs imagePullPolicy: Always name: lxcfs resources: {} securityContext: privileged: true volumeMounts: - mountPath: /sys/fs/cgroup name: cgroup - mountPath: /var/lib/lxcfs mountPropagation: Bidirectional name: lxcfs - mountPath: /usr/local name: usr-local - mountPath: /usr/lib64 name: usr-lib64 hostPID: true imagePullSecrets: - name: your-docker-token restartPolicy: Always tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master - effect: NoSchedule key: your-taint-key operator: Exists volumes: - hostPath: path: /sys/fs/cgroup type: "" name: cgroup - hostPath: path: /usr/local type: "" name: usr-local - hostPath: path: /usr/lib64 type: "" name: usr-lib64 - hostPath: path: /var/lib/lxcfs type: DirectoryOrCreate name: lxcfs
apply上述yaml后可能个别node上lxcfs daemonset pod 启动保如下错误
Error: failed to generate container "974c6c0465adae1a244e3416b3e053ba2dccb0cbd123c2d02317c9301e3f83d0" spec: failed to apply OCI options: failed to stat "/var/lib/lxcfs": stat /var/lib/lxcfs: transport endpoint is not connected
解决办法
umount /var/lib/lxcfsc. 验证 deployment pod yaml 定义
apiVersion: apps/v1kind: Deploymentmetadata: name: webspec: replicas: 2 selector: matchLabels: app: web template: metadata: labels: app: web spec: volumes: - hostPath: path: /var/lib/lxcfs/proc/cpuinfo type: "" name: lxcfs-proc-cpuinfo - hostPath: path: /var/lib/lxcfs/proc/diskstats type: "" name: lxcfs-proc-diskstats - hostPath: path: /var/lib/lxcfs/proc/meminfo type: "" name: lxcfs-proc-meminfo - hostPath: path: /var/lib/lxcfs/proc/stat type: "" name: lxcfs-proc-stat - hostPath: path: /var/lib/lxcfs/proc/swaps type: "" name: lxcfs-proc-swaps - hostPath: path: /var/lib/lxcfs/proc/uptime type: "" name: lxcfs-proc-uptime - hostPath: path: /var/lib/lxcfs/proc/loadavg type: "" name: lxcfs-proc-loadavg - hostPath: path: /var/lib/lxcfs/sys/devices/system/cpu/online type: "" name: lxcfs-sys-devices-system-cpu-online containers: - name: web image: httpd:2.4.32 imagePullPolicy: Always resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "256Mi" cpu: "500m" volumeMounts: - mountPath: /proc/cpuinfo name: lxcfs-proc-cpuinfo readOnly: true - mountPath: /proc/meminfo name: lxcfs-proc-meminfo readOnly: true - mountPath: /proc/diskstats name: lxcfs-proc-diskstats readOnly: true - mountPath: /proc/stat name: lxcfs-proc-stat readOnly: true - mountPath: /proc/swaps name: lxcfs-proc-swaps readOnly: true - mountPath: /proc/uptime name: lxcfs-proc-uptime readOnly: true - mountPath: /proc/loadavg name: lxcfs-proc-loadavg readOnly: true - mountPath: /sys/devices/system/cpu/online name: lxcfs-sys-devices-system-cpu-online readOnly: true
这样pod通过lxcfs实现了容器资源视图隔离。
但这里有一个问题一个两个容器这样复制粘贴设置还能接受,成千上万和容器这种重复操作,作为追求KISS原则的你肯定不能忍。
那有没有办法解决呢?我们可以通过实现 admission-webhook (准入控制 Admission Control)在授权后对请求做进一步的验证或添加默认参数。我们想到的前辈们都已经实现,就不用重复造轮子了。可以参考 lxcfs-admission-webhook
lxcfs-admission-webhook 注入实现容器自动挂载/proc、/sys/
lxcfs-admission-webhook实现了一个动态的准入webhook,更准确的讲是实现了一个修改性质的webhook,即监听pod的创建,然后对pod执行patch的操作,从而将lxcfs与容器内的目录映射关系植入到pod创建的yaml中从而实现自动挂载。
使用上也比较KISS,只用在资源文件里加一条注解即可。
下面我们看看怎么玩
1. 准备lxcfs-admission-webhook镜像
go build 二进制
git clone git@github.com:denverdino/lxcfs-admission-webhook.gitcd lxcfs-admission-webhook# build lxcfs-admission-webhook,因为是老的go项目需要转成支持go modexport GOPROXY= mod init v1go mody tidyCGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o lxcfs-admission-webhookchmod +x lxcfs-admission-webhook
Dockerfile
FROM alpine:latestADD lxcfs-admission-webhook /lxcfs-admission-webhookENTRYPOINT ["./lxcfs-admission-webhook"]
构建镜像
docker build -t yourharbor.domain.com/alpine/lxcfs-admission-webhook:v1 .docker push yourharbor.domain.com/alpine/lxcfs-admission-webhook:v12. 运行lxcfs-admission-webhookpod
每个集群都有自己的CA证书,所以不同集群部署lxcfs-admission-webhook,先做如下操作再应用yaml
2.1 目录结构
tree . .├── dp.yaml #lxcfs-admission-webhook deployment├── mutatingwebhook.yaml #MutatingWebhookConfiguration└── svc.yaml #webhook svc└── webhook-create-signed-cert.sh #创建`lxcfs-admission-webhook`依赖证书2.2 修改webhook-create-signed-cert.sh
注:由于k8s版本较新,lxcfs-admission-webhook近几年没有更新,所以适配新版本k8s修改了github上的k8s的证书生成脚本webhook-create-signed-cert.sh
#!/bin/bashset -eusage() { cat <<EOFGenerate certificate suitable for use with an sidecar-injector webhook service.This script uses k8s' CertificateSigningRequest API to a generate acertificate signed by k8s CA suitable for use with sidecar-injector webhookservices. This requires permissions to create and approve CSR. See fordetailed explantion and additional instructions.The server key/cert k8s CA cert are stored in a k8s secret.usage: ${0} [OPTIONS]The following flags are required. --service Service name of webhook. --namespace Namespace where webhook service and secret reside. --secret Secret name for CA certificate and server certificate/key pair.EOF exit 1}while [[ $# -gt 0 ]]; do case ${1} in --service) service="$2" shift ;; --secret) secret="$2" shift ;; --namespace) namespace="$2" shift ;; *) usage ;; esac shiftdone[ -z ${service} ] && service=lxcfs-admission-webhook-svc[ -z ${secret} ] && secret=lxcfs-admission-webhook-certs[ -z ${namespace} ] && namespace=defaultif [ ! -x "$(command -v openssl)" ]; then echo "openssl not found" exit 1ficsrName=${service}.${namespace}tmpdir=$(mktemp -d)echo "creating certs in tmpdir ${tmpdir} "cat <<EOF >> ${tmpdir}/csr.conf[req]req_extensions = v3_reqdistinguished_name = req_distinguished_name[req_distinguished_name][ v3_req ]basicConstraints = CA:FALSEkeyUsage = nonRepudiation, digitalSignature, keyEnciphermentextendedKeyUsage = serverAuthsubjectAltName = @alt_names[alt_names]DNS.1 = ${service}DNS.2 = ${service}.${namespace}DNS.3 = ${service}.${namespace}.svcEOFopenssl genrsa -out ${tmpdir}/server-key.pem 2048#openssl req -new -key ${tmpdir}/server-key.pem -subj "/CN=${service}.${namespace}.svc" -out ${tmpdir}/server.csr -config ${tmpdir}/csr.confopenssl req -new -key ${tmpdir}/server-key.pem -subj "/CN=system:node:${service}.${namespace}.svc;/O=system:nodes" -out ${tmpdir}/server.csr -config ${tmpdir}/csr.conf# clean-up any previously created CSR for our service. Ignore errors if not present.kubectl delete csr ${csrName} -n ${namespace} 2>/dev/null || true# create server cert/key CSR and send to k8s APIcat <<EOF | kubectl -n ${namespace} create -f -apiVersion: certificates.k8s.io/v1kind: CertificateSigningRequestmetadata: name: ${csrName}spec: groups: - system:authenticated signerName: kubernetes.io/kubelet-serving request: $(cat ${tmpdir}/server.csr | base64 | tr -d '\n') usages: - digital signature - key encipherment - server authEOF# verify CSR has been createdwhile true; do kubectl get csr ${csrName} if [ "$?" -eq 0 ]; then break fidone# approve and fetch the signed certificatekubectl certificate approve ${csrName}# verify certificate has been signedfor x in $(seq 10); do serverCert=$(kubectl get csr ${csrName} -o jsonpath='{.status.certificate}') if [[ ${serverCert} != '' ]]; then break fi sleep 1doneif [[ ${serverCert} == '' ]]; then echo "ERROR: After approving csr ${csrName}, the signed certificate did not appear on the resource. Giving up after 10 attempts." >&2 exit 1fiecho ${serverCert} | openssl base64 -d -A -out ${tmpdir}/server-cert.pem# create the secret with CA cert and server cert/keykubectl create secret generic ${secret} \ --from-file=key.pem=${tmpdir}/server-key.pem \ --from-file=cert.pem=${tmpdir}/server-cert.pem \ --dry-run -o yaml | kubectl -n ${namespace} apply -f -
修改了证书请求命令/CN=system:node:${service}.${namespace}.svc;/O=system:nodes 和 修改了--namespace 的bug
然后在k8s master 节点上运行 kubectl create ns lxcfs ; sh webhook-create-signed-cert.sh --namespace lxcfs
2.2 获取集群CA证书内容
kubectl config view --raw --flatten --minify -o jsonpath='{.clusters[].cluster.certificate-authority-data}'2.3 更新CA证书内容到mutatingwebhook.yamlcaBundle字段
apiVersion: admissionregistration.k8s.io/v1beta1kind: MutatingWebhookConfigurationmetadata: name: mutating-lxcfs-admission-webhook-cfg labels: app: lxcfs-admission-webhookwebhooks: - name: mutating.lxcfs-admission-webhook.aliyun.com clientConfig: service: name: lxcfs-admission-webhook-svc namespace: default path: "/mutate" caBundle: LS0tLS1CRUdJxiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJek1EY3hOekEwTXpNek5Gb1hEVE16TURjeE5EQTBNek16TkZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTlVZCjd4SThpcXZtbEtNN0FDTUFDY0huRWxxTXgyakR1b3JkWk81cUNGYTBNalROOXNqZHhUbHNNTlMrUHpuOUxPSkMKZ2d5TW90MGNPaW0zQTd2bllRYzFCY2I3UHFLOGpjS0U2a0E5MWVyNlpNSHU0c3ZXRXEybjVyMlIvcnY5NUR2eQpIRzlzTUJnenQrWUFJNlR6OGJNazhnMzJZR1BJejEvTTJmalBCa292bVJ3U0c1UkVIYWVFNW1TdDBRMnJheGJQCmtEU0pDSEErVlV3QThuekpFRVpwdkIxbUZ6MytXKzhrOUpIYlFtSW40TzhNaCtYYXlGc2Vab2g5SC9kVERkSXUKN0JXVG5pcmg5YkNWZzJhSDJidG03ZVpSY2s1V3IrM0QxcmUrc1FxWnpVdlhFSzBQYTk4MENGd3BYTVhsenlFdQpqNkhQRjZzOUhmV0gxOVdJMUdrQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZBQVVicWVyaklyUDRmOFV0ZjErUzRERzVSWStNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBTGx0OHBELzVtMnhVclJSdUJIdQpaODFKbnpDSzB6Y2ZhbHRROXFiWkFQb2syT1R6eTQrclh6SHQ4VzVHN01YVmN6TXVoZnh0OXFSeWVLekM3bmtICnpJSnIxcmxPbkkwaXdNcHJFeDlNQkpBTnBNdWNwN3ljaE82RGlOQ01ocFAwMXdDbWVENTBsVUladlIrMHhUbHEKaGVZdTFZS3Eza3Q0dzNuWVUxUGszUGU1Q3NweFNqd0NKNVF0RHpyUFY4bE5JaHNMZjRHV2U2bDN0N2J5ck9wWApsUWJiMXovazNRTDRTU3pqcEdkQVRmUnVmRmsrbk1RVkFCSmJwVWp5aHNFMlg1TjRvLzlKWFVpZVhLNlYxOHNiCnVtVUlLYlkySGIyTHNISXEveTBHeHpITnpGTndEeEdGNnNSWFF5SkFYVS9tekNWRWczbEhaWUlpUU9wdkc2VdfsZXFVPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0twx== rules: - operations: [ "CREATE" ] apiGroups: ["core", ""] apiVersions: ["v1"] resources: ["pods"] namespaceSelector: matchLabels: lxcfs-admission-webhook: enabled2.4lxcfs-admission-webhook的dp.yaml
apiVersion: apps/v1kind: Deploymentmetadata: name: lxcfs-admission-webhook-deployment labels: app: lxcfs-admission-webhook namespace: lxcfsspec: replicas: 1 selector: matchLabels: app: lxcfs-admission-webhook template: metadata: labels: app: lxcfs-admission-webhook spec: imagePullSecrets: - name: your-docker-token containers: - name: lxcfs-admission-webhook image: yourharbor.domain.com/alpine/lxcfs-admission-webhook:v1 imagePullPolicy: IfNotPresent args: - -tlsCertFile=/etc/webhook/certs/cert.pem - -tlsKeyFile=/etc/webhook/certs/key.pem - -alsologtostderr - -v=4 - 2>&1 volumeMounts: - name: webhook-certs mountPath: /etc/webhook/certs readOnly: true volumes: - name: webhook-certs secret: secretName: lxcfs-admission-webhook-certs2.5 svc.yaml
apiVersion: v1kind: Servicemetadata: namespace: lxcfs name: lxcfs-admission-webhook-svc labels: app: lxcfs-admission-webhookspec: ports: - port: 443 targetPort: 443 selector: app: lxcfs-admission-webhook3.验证,应用注解能力
给default namespace 开启lxcfs能力
kubectl label namespace default lxcfs-admission-webhook=enabled
部署deployment
cd lxcfs-admission-webhookkubectl apply -f deployment/web.yaml
登录容器执行free
$ kubectl get podNAME READY STATUS RESTARTS AGElxcfs-admission-webhook-deployment-f4bdd6f66-5wrlg 1/1 Running 0 8m29slxcfs-pqs2d 1/1 Running 0 55mlxcfs-zfh99 1/1 Running 0 55mweb-7c5464f6b9-6zxdf 1/1 Running 0 8m10sweb-7c5464f6b9-nktff 1/1 Running 0 8m10s$ kubectl exec -ti web-7c5464f6b9-6zxdf sh# free total used free shared buffers cachedMem: 262144 2744 259400 0 0 312-/+ buffers/cache: 2432 259712Swap: 0 0 0#总结
这里强调一下,我们实现的是容器资源视图和物理机资源视图的隔离,而非pod的。
容器资源视图隔离后,视觉上舒服很多,对定位问题,服务启动,网络安全上都有很大帮助,行动起来吧。欢迎关注DevOpSec每周分享干货内容,我们一起进步。
标签: #ubuntu更新uvc