龙空技术网

万字长文 | 面向k8s编程,如何写一个Operator

新钛云服 263

前言:

此时小伙伴们对“ubuntu90464位下载”大概比较注意,同学们都想要剖析一些“ubuntu90464位下载”的相关资讯。那么小编在网络上网罗了一些关于“ubuntu90464位下载””的相关知识,希望各位老铁们能喜欢,小伙伴们快快来了解一下吧!

云和安全管理服务专家新钛云服 刘川川翻译

概述

随着我们对 Kubernetes 的逐步了解,可能就会发现 Kubernetes 中内置的对象定义,比如 DeploymentStatefulSetConfigMap,可能已经不能满足我们的需求。我们希望在 Kubernetes 定义一些自己的对象,一是可以通过 kube-apiserver 提供统一的访问入口,二是可以像其他内置对象一样,通过 kubectl 命令管理这些自定义的对象。

Kubernetes 中提供了这种自定义对象的方式,其中之一就是 CRD

CRD 介绍

CRDCustomResourceDefinitions)在 v1.7 刚引入进来的时候,其实是 ThirdPartyResources(TPR)的升级版本,而 TPR 在 v1.8 的版本被剔除了。CRD 目前使用非常广泛,各个周边项目都在使用它,比如 Ingress、Rancher 等。

我们来看一下官方提供的一个例子,通过如下的 YAML 文件,我们可以创建一个 API:

apiVersion: apiextensions.k8s.io/v1kind: CustomResourceDefinitionmetadata: # 名字须与下面的 spec 字段匹配,并且格式为 '<名称的复数形式>.<组名>'name: crontabs.stable.example.comspec: # 组名称,用于 REST API: /apis/<组>/<版本>group: stable.example.com # 列举此 CustomResourceDefinition 所支持的版本versions:  - name: v1     # 每个版本都可以通过 served 标志来独立启用或禁止    served: true     # 其中一个且只有一个版本必需被标记为存储版本    storage: true    schema:      openAPIV3Schema:        type: object        properties:          spec:            type: object            properties:              cronSpec:                type: string              image:                type: string              replicas:                type: integer # 可以是 Namespaced 或 Clusterscope: Namespacednames:   # 名称的复数形式,用于 URL:/apis/<组>/<版本>/<名称的复数形式>  plural: crontabs   # 名称的单数形式,作为命令行使用时和显示时的别名  singular: crontab   # kind 通常是单数形式的驼峰编码(CamelCased)形式。我们的资源清单会使用这一形式。  kind: CronTab   # shortNames 允许我们在命令行使用较短的字符串来匹配资源  shortNames:  - ct

这样我们就可以像创建其他对象一样,通过 kubectl create 命令创建。创建完成以后,一个类型为 CronTab 的对象就在 kube-apiserver 中注册好了,我们可以通过如下的 REST 接口访问,比如查看命名空间 ns1 下的 CronTab 对象,可以通过这个 URL “/apis/stable.example.com/v1/namespaces/ns1/crontabs/” 访问。这种接口跟 Kubernetes 内置的其他对象的接口风格是一模一样的。

声明好了 CronTab,我们就来看看如何创建一个 CronTab 类型的对象。下面依然是来自官方的一个例子:

apiVersion: "stable.example.com/v1"kind: CronTabmetadata:name: new-cron-objectspec:cronSpec: "* * * * */5"image: awesome-cron-image

通过 kubectl create 创建 new-cron-object 后,就可以通过 kubectl get 查看,并使用 kubectl 管理这个 CronTab 对象了。例如:

kubectl get crontabNAME             AGEnew-cron-object   6s

这里的资源名是大小写不敏感的,我们在这里可以使用缩写 kubectl get ct,也可以使用 kubectl get crontabs。同时原生内置的 API 对象一样,这些 CRD 不仅可以通过 kubectl 来创建、查看、修改,删除等操作,还可以给其配置 RBAC 规则。

我们还可以开发自定义的控制器,来感知和操作这些自定义的 API。接下来我们就开始介绍。可以参考 定制资源 | Kubernetes(我是否应该向我的-kubernetes-集群添加定制资源) 这份说明确定是否需要在 Kubernetes 中定义 API,还是让我们的 API 独立运行。

什么是 Kubernetes Operator

我们可能对 Operator 这个名字比较陌生。这个名字最早由 CoreOS() 在 2016 年提出来,我们来看看他们给出的定义:

An operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling.

To be able to make the most of Kubernetes, you need a set of cohensive APIs to extend in order to service and manage your applications that run on Kubernetes. You can think of Operators as the runtime that manages this type of application on Kubernetes.

简单概括一下,所谓的 Kubernetes Operator 其实就是借助 Kubernetes 的控制器模式,配合一些自定义的 API,完成对某一类应用的操作,比如资源创建、变更、删除等操作。

这里对 Kubernetes 的控制器模式做个简要说明。Kubernetes 通过声明式 API 来定义对象,各个控制器负责实时查看对应对象的状态,确保达到定义的期望状态。这就是 Kubernetes 的控制器模式。

kube-controller-manager 就是由这样一组控制器组成的。我们以 StatefulSet 为例来简单说明下控制器的具体逻辑。

假设我们声明了一个 StatefulSet,并将其副本数设置为 3。kube-controller-manager 中以 goroutine 方式运行的 StatefulSet 控制器在观察 kube-apiserver 的时候,发现了这个新创建的对象,它会先创建一个 index 为 0 的 Pod ,并实时观察这个 Pod 的状态,待其状态变为 Running 后,再创建 index 为 1 的 Pod。后续该控制器会一直观察并维护这些 Pod 的状态,保证 StatefulSet 的有效副本数始终为 3。

所以我们在声明完成 CRD 之后,也需要创建一个控制器,即 Operator,来完成对应的控制逻辑。在了解了 Operator 的概念和控制器模式后,我们来看看 Operator 是如何工作的。

Kubernetes Operator 是如何工作的

Operator 工作的时候采用上述的控制器模式,会持续地观察 Kubernetes 中的自定义对象,即 CRCustom Resource)。我们通过 CRD 来定义一个对象,CR 则是 CRD 实例化的对象。

Operator 会持续跟踪这些 CR 的变化事件,比如 ADDUPDATEDELETE,然后采取一系列操作,使其达到期望的状态。上述的流程其实还是有些复杂的,尤其是对运维同学有一定的门槛。好在社区提供了一些脚手架,可以方便我们快速地构建自己的 Operator

构建一个自己的 Kubernetes Operator

目前社区有一些可以用于创建 Kubernetes Operator 的开源项目,例如:Operator SDK()、Kubebuilder()、KUDO()。我们这里以 Operator SDK 为例,接下来就安装 Operator SDK。

二进制安装 Operator SDK前提条件curl()gpg() version 2.0+版本信息请参考:kubernetes/client-go: Go client for Kubernetes(). (github.com)1、下载二进制文件

设置平台信息:

[root@blog ~]# export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)[root@blog ~]# export OS=$(uname | awk '{print tolower($0)}')

下载指定的文件:

[root@blog ~]# export OPERATOR_SDK_DL_URL=[root@blog ~]# curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
2、验证已下载的文件(可选)

keyserver.ubuntu.com 导入 operator-sdk 发行版的 GPG key :

[root@blog ~]# gpg --keyserver keyserver.ubuntu.com --recv-keys 052996E2A20B5C7E

下载 checksums 文件及其签名,然后验证签名:

[root@blog ~]# curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt[root@blog ~]# curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt.asc[root@blog ~]# gpg -u "Operator SDK (release) <cncf-operator-sdk@cncf.io>" --verify checksums.txt.asc

我们会看到一些类似下面的一些输出信息:

gpg: assuming signed data in 'checksums.txt'gpg: Signature made Fri 30 Oct 2020 12:15:15 PM PDTgpg:                using RSA key ADE83605E945FA5A1BD8639C59E5B47624962185gpg: Good signature from "Operator SDK (release) <cncf-operator-sdk@cncf.io>" [ultimate]

确保 checksums 匹配:

[root@blog ~]# grep operator-sdk_${OS}_${ARCH} checksums.txt | sha256sum -c -operator-sdk_linux_amd64: OK

确保类似下面的输出信息:

operator-sdk_linux_amd64: OK
3、把二进制文件放到 PATH 下面
[root@blog ~]# chmod +x operator-sdk_${OS}_${ARCH} && sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk
源码编译安装 Operator SDK前提条件git(go version 1.16+确保 GOPROXY 设置为 ";
[root@blog ~]# export GO111MODULE=on[root@blog ~]# export GOPROXY=[root@blog ~]# git clone [root@blog ~]# cd operator-sdk[root@blog operator-sdk]# make install

验证版本:

[root@blog operator-sdk]# operator-sdk versionoperator-sdk version: "v1.16.0", commit: "560044140c4f3d88677e4ef2872931f5bb97f255", kubernetes version: "1.21", go version: "go1.16.13", GOOS: "linux", GOARCH: "amd64"# 由上述命令的输出来看,我们应该可以看出要使用的版本信息。# 如我们使用的 operator-sdk 版本为:v1.16.0# Go 的版本为:1.16.13# Kubernetes 版本为:1.21[root@blog operator-sdk]# go versiongo version go1.16.13 linux/amd64

通过上述任何一种形式,就可以完成基础环境的搭建。接下来我们就创建一个 Operator。我们可以使用 Ansible、Helm 及 Go 结合 SDK 创建 Operator,使用 Ansible 及 Helm 的形式相对简单些。本文将使用 Go 的形式及 Operator SDK 来进行演示。

使用 Go 创建 Operator

Operator SDK 提供以下工作流来开发一个新的 Operator:

使用 SDK 创建一个新的 Operator 项目通过添加自定义资源(CRD)定义新的资源 API指定使用 SDK API 来 watch 的资源定义 Operator 的协调(reconcile)逻辑使用 Operator SDK 构建并生成 Operator 部署清单文件前提条件参照前面的介绍进行安装 operator-sdk。要有 cluster-admin 权限。一个可以访问的 Operator 镜像(例如 hub.docker.com、quay.io),并可以在命令行环境中登录。example.com 本例中在 Dockers Hub 上的一个命名空间。如果我们使用其他 registry 或命名空间的话,请相应的替换掉即可。如果 registry 是私有的,请准备好相关的认证或证书。

接下来我们就会按照下面的流程创建一个工程:

如果不存在 Memcached Deployment 就创建一个确保 Deployment 中的 size 与 Memcached CR 中的 size一致使用带有 CR pod 名称的状态写入器更新 Memcached CR 的状态创建工程

接下来使用命令行工具创建一个名为 memcached-operator 的工程:

[root@blog operator-sdk]# mkdir /root/memcached-operator[root@blog operator-sdk]# cd /root/memcached-operator[root@blog memcached-operator]# export GO111MODULE=on && export GOPROXY=[root@blog memcached-operator]# operator-sdk init \--domain example.com \--repo github.com/example/memcached-operatorWriting kustomize manifests for you to edit...Writing scaffold for you to edit...Get controller runtime:$ go get sigs.k8s.io/controller-runtime@v0.10.0Update dependencies:$ go mod tidyNext: define a resource with:$ operator-sdk create api

创建完成之后,我们看一下代码的目录结构:

[root@blog memcached-operator]# tree -L 2.├── config│   ├── default│   ├── manager│   ├── manifests│   ├── prometheus│   ├── rbac│   └── scorecard├── Dockerfile├── go.mod├── go.sum├── hack│   └── boilerplate.go.txt├── main.go├── Makefile└── PROJECT8 directories, 7 files

operator-sdk init 生成了一个 go.mod 文件。当我们的工程不在 $GOPATH/src 下面,则 --repo=<path> 选项是必须的,因为脚手架需要一个有效的 module 路径。

在使用 SDK 前,我们要确保开启了模块支持。需要设置:export GO111MODULE=on。为了加速下载 Go 的依赖,需要设置合适的代理。如:export GOPROXY=

此时,我们可以使用 go build 命令构建:

[root@blog memcached-operator]# go build[root@blog memcached-operator]# lltotal 44788drwx------ 8 root root      100 Mar 30 21:01 config-rw------- 1 root root      776 Mar 30 20:59 Dockerfile-rw------- 1 root root      162 Mar 30 21:01 go.mod-rw-r--r-- 1 root root    77000 Mar 30 21:01 go.sumdrwx------ 2 root root       32 Mar 30 20:59 hack-rw------- 1 root root     2780 Mar 30 20:59 main.go-rw------- 1 root root     9449 Mar 30 21:01 Makefile-rwxr-xr-x 1 root root 45754092 Mar 30 21:02 memcached-operator-rw------- 1 root root      235 Mar 30 21:01 PROJECT

目录结构中,还有一个 PROJECT 的文件,我们看看它里面有什么内容。

[root@blog memcached-operator]# cat PROJECTdomain: example.comlayout:- go.kubebuilder.io/v3plugins:manifests.sdk.operatorframework.io/v2: {}scorecard.sdk.operatorframework.io/v2: {}projectName: memcached-operatorrepo: github.com/example/memcached-operatorversion: "3"

它主要是一些我们工程的配置信息。

Manager(管理器)

Operator 的主代码 main.go 主要是初始化并运行 Manager().有关管理器如何为自定义资源 API 定义注册 Scheme 以及设置和运行的更多详细信息,请参阅 Kubebuilder 入口文档() 控制器和 webhook。Manager 可以限制所有控制器监视资源的命名空间:

mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{    Scheme:                 scheme,    MetricsBindAddress:     metricsAddr,    Port:                   9443,    HealthProbeBindAddress: probeAddr,    LeaderElection:         enableLeaderElection,    LeaderElectionID:       "86f835c3.my.domain",})

当然,我们也可以使用 MultiNamespacedCacheBuilder 来 watch 一组 namespace:

var namespaces []string // List of Namespacesmgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{    Scheme:                 scheme,    NewCache:               cache.MultiNamespacedCacheBuilder(namespaces),    MetricsBindAddress:     fmt.Sprintf("%s:%d", metricsHost, metricsPort),    Port:                   9443,    HealthProbeBindAddress: probeAddr,    LeaderElection:         enableLeaderElection,    LeaderElectionID:       "86f835c3.my.domain",})

对于更新详细的信息,我们可以阅读 MultiNamespacedCacheBuilder ()文档。

创建 API 及 Controller(控制器)

接下来使用 group 名为 cache, 版本为 v1alpha1KindMemcached 创建一个新的自定义资源定义 (CRD) API。

[root@blog memcached-operator]# operator-sdk create api \--group cache \--version v1alpha1 \--kind Memcached \--resource \--controller# 下面是上述命令的输出Writing scaffold for you to edit...api/v1alpha1/memcached_types.gocontrollers/memcached_controller.goUpdate dependencies:$ go mod tidyRunning make:$ make generatego: creating new go.mod: module tmpDownloading sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0  # 下载了 controller-gen 文件go: downloading sigs.k8s.io/controller-tools v0.7.0go: downloading golang.org/x/tools v0.1.5go: downloading k8s.io/apimachinery v0.22.2go: downloading k8s.io/api v0.22.2go: downloading k8s.io/apiextensions-apiserver v0.22.2go: downloading github.com/inconshreveable/mousetrap v1.0.0go: downloading golang.org/x/sys v0.0.0-20210616094352-59db8d763f22go: downloading k8s.io/utils v0.0.0-20210819203725-bdf08cb9a70ago: downloading golang.org/x/mod v0.4.2go get: added sigs.k8s.io/controller-tools v0.7.0/root/memcached-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:$ make manifests

再次查看 PROJECT 文件:

[root@blog memcached-operator]# cat PROJECTdomain: example.comlayout:- go.kubebuilder.io/v3plugins:  manifests.sdk.operatorframework.io/v2: {}  scorecard.sdk.operatorframework.io/v2: {}projectName: memcached-operatorrepo: github.com/example/memcached-operatorresources:- api:    crdVersion: v1    namespaced: true  controller: true  domain: example.com  group: cache  kind: Memcached  path: github.com/example/memcached-operator/api/v1alpha1  version: v1alpha1version: "3"

上述的操作将会生成 Memcached resource API 文件,其文件位于 api/v1alpha1/memcached_types.go 及控制器文件位于 controllers/memcached_controller.go 文件中。

注意:本文只介绍了单组 API 的使用。如果我们想支持多组 API,请参考 Single Group to Multi-Group ()文档。

这个时候我们在看一下目录结构:

[root@blog memcached-operator]# tree -L 2.├── api│   └── v1alpha1├── bin│   └── controller-gen├── config│   ├── crd│   ├── default│   ├── manager│   ├── manifests│   ├── prometheus│   ├── rbac│   ├── samples│   └── scorecard├── controllers│   ├── memcached_controller.go│   └── suite_test.go├── Dockerfile├── go.mod├── go.sum├── hack│   └── boilerplate.go.txt├── main.go├── Makefile└── PROJECT14 directories, 10 files
理解 Kubernetes 的 APIs

有关 Kubernetes API 和 group-version-kind 模型的深入解读,我们可以查看这些 kubebuilder docs ()文档。一般来说,建议让一个控制器负责管理工程的每个 API,以遵循 controller-runtime ()设定的设计目标。

定义 API

首先,我们将通过定义 “Memcached” 类型来表示我们的 API,该类型有一个 “MemcachedSpec.Size” 字段来设置要部署的 memcached 实例(CR)的数量,以及一个 “MemcachedStatus.Nodes” 字段来存储 CR 的 Pod 名称。

注意: 这里 Node 字段只是用于演示 Status 字段。在实际情况下,建议使用 Conditions ().

接下来修改 api/v1alpha1/memcached_types.go 中的 Go 类型定义,为 Memcached 自定义资源(CR)定义 API,使其具有以下规格和状态:

// MemcachedSpec defines the desired state of Memcachedtype MemcachedSpec struct {    //+kubebuilder:validation:Minimum=0    // Size is the size of the memcached deploymentSize int32 `json:"size"`}// MemcachedStatus defines the observed state of Memcachedtype MemcachedStatus struct {    // Nodes are the names of the memcached pods    Nodes []string `json:"nodes"`}

接下来添加 +kubebuilder:subresource:statusmarker() 以添加 status subresource ( docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource)到 CRD 清单,以便控制器可以在不更改 CR 对象的其余部分的情况下更新 CR 状态:

// Memcached is the Schema for the memcacheds API//+kubebuilder:subresource:status // 增加此行type Memcached struct {    metav1.TypeMeta   `json:",inline"`    metav1.ObjectMeta `json:"metadata,omitempty"`    Spec   MemcachedSpec   `json:"spec,omitempty"`    Status MemcachedStatus `json:"status,omitempty"`}

修改 *_types.go 文件后,记得要运行以下命令来为该资源类型生成代码:

[root@blog memcached-operator]# make generate/root/memcached-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."

上面的 makefile 的 generate 目标将调用 controller-gen()实用程序来更新 api/v1alpha1/zz_generated.deepcopy.go 文件以确保我们 API 的 Go 类型定义实现所有 Kind 类型必须实现的 runtime.Object 接口。

生成 CRD 清单

一旦使用 spec/status 字段和 CRD 验证标记定义 API 后,可以使用以下命令生成和更新 CRD 清单:

[root@blog memcached-operator]# make manifests/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases

这个 makefile 的 manifests 目标将调用 controller-gen 在 config/crd/bases/cache.example.com_memcacheds.yaml 文件中生成 CRD 清单。

验证 OpenAPI

CRD 中定义的 OpenAPI 验证可确保 CR 基于一组声明性规则进行验证。所有 CRD 都应该有验证。有关详细信息,请参阅 OpenAPI 验证() 文档。

实现 Controller

对于此示例,将生成的控制器文件 controllers/memcached_controller.go 替换为示例 memcached_controller.go 文件。其代码如下:

/*Copyright 2022.Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at     required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.*/package controllersimport (	appsv1 "k8s.io/api/apps/v1"	corev1 "k8s.io/api/core/v1"	"k8s.io/apimachinery/pkg/api/errors"	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"	"k8s.io/apimachinery/pkg/types"	"reflect"	"time"	"context"	"k8s.io/apimachinery/pkg/runtime"	ctrl "sigs.k8s.io/controller-runtime"	"sigs.k8s.io/controller-runtime/pkg/client"	ctrllog "sigs.k8s.io/controller-runtime/pkg/log"	cachev1alpha1 "github.com/example/memcached-operator/api/v1alpha1")// MemcachedReconciler reconciles a Memcached objecttype MemcachedReconciler struct {	client.Client	Scheme *runtime.Scheme}//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/finalizers,verbs=update//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete//+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch// Reconcile is part of the main kubernetes reconciliation loop which aims to// move the current state of the cluster closer to the desired state.// TODO(user): Modify the Reconcile function to compare the state specified by// the Memcached object against the actual cluster state, and then// perform operations to make the cluster state reflect the state specified by// the user.//// For more details, check Reconcile and its Result here:// -  (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {	log := ctrllog.FromContext(ctx)	// Fetch the Memcached instance	memcached := &cachev1alpha1.Memcached{}	err := r.Get(ctx, req.NamespacedName, memcached)	if err != nil {		if errors.IsNotFound(err) {			// Request object not found, could have been deleted after reconcile request.                        // Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.                       // Return and don't requeue			log.Info("Memcached resource not found. Ignoring since object must be deleted")			return ctrl.Result{}, nil		}		// Error reading the object - requeue the request.		log.Error(err, "Failed to get Memcached")		return ctrl.Result{}, err	}	// Check if the deployment already exists, if not create a new one	found := &appsv1.Deployment{}	err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)	if err != nil && errors.IsNotFound(err) {		// Define a new deployment		dep := r.deploymentForMemcached(memcached)		log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)		err = r.Create(ctx, dep)		if err != nil {			log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)			return ctrl.Result{}, err		}		// Deployment created successfully - return and requeue		return ctrl.Result{Requeue: true}, nil	} else if err != nil {		log.Error(err, "Failed to get Deployment")		return ctrl.Result{}, err	}	// Ensure the deployment size is the same as the spec	size := memcached.Spec.Size	if *found.Spec.Replicas != size {		found.Spec.Replicas = &size		err = r.Update(ctx, found)		if err != nil {			log.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)			return ctrl.Result{}, err		}		// Ask to requeue after 1 minute in order to give enough time for the		// pods be created on the cluster side and the operand be able		// to do the next update step accurately.		return ctrl.Result{RequeueAfter: time.Minute}, nil	}	// Update the Memcached status with the pod names	// List the pods for this memcached's deployment	podList := &corev1.PodList{}	listOpts := []client.ListOption{		client.InNamespace(memcached.Namespace),		client.MatchingLabels(labelsForMemcached(memcached.Name)),	}	if err = r.List(ctx, podList, listOpts...); err != nil {		log.Error(err, "Failed to list pods", "Memcached.Namespace", memcached.Namespace, "Memcached.Name", memcached.Name)		return ctrl.Result{}, err	}	podNames := getPodNames(podList.Items)	// Update status.Nodes if needed	if !reflect.DeepEqual(podNames, memcached.Status.Nodes) {		memcached.Status.Nodes = podNames		err := r.Status().Update(ctx, memcached)		if err != nil {			log.Error(err, "Failed to update Memcached status")			return ctrl.Result{}, err		}	}	return ctrl.Result{}, nil}// deploymentForMemcached returns a memcached Deployment objectfunc (r *MemcachedReconciler) deploymentForMemcached(m *cachev1alpha1.Memcached) *appsv1.Deployment {	ls := labelsForMemcached(m.Name)	replicas := m.Spec.Size	dep := &appsv1.Deployment{		ObjectMeta: metav1.ObjectMeta{			Name:      m.Name,			Namespace: m.Namespace,		},		Spec: appsv1.DeploymentSpec{			Replicas: &replicas,			Selector: &metav1.LabelSelector{				MatchLabels: ls,			},			Template: corev1.PodTemplateSpec{				ObjectMeta: metav1.ObjectMeta{					Labels: ls,				},				Spec: corev1.PodSpec{					Containers: []corev1.Container{{						Image:   "memcached:1.4.36-alpine",						Name:    "memcached",						Command: []string{"memcached", "-m=64", "-o", "modern", "-v"},						Ports: []corev1.ContainerPort{{							ContainerPort: 11211,							Name:          "memcached",						}},					}},				},			},		},	}	// Set Memcached instance as the owner and controller	ctrl.SetControllerReference(m, dep, r.Scheme)	return dep}// labelsForMemcached returns the labels for selecting the resources// belonging to the given memcached CR name.func labelsForMemcached(name string) map[string]string {	return map[string]string{"app": "memcached", "memcached_cr": name}}// getPodNames returns the pod names of the array of pods passed infunc getPodNames(pods []corev1.Pod) []string {	var podNames []string	for _, pod := range pods {		podNames = append(podNames, pod.Name)	}	return podNames}// SetupWithManager sets up the controller with the Manager.func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {	return ctrl.NewControllerManagedBy(mgr).		For(&cachev1alpha1.Memcached{}).		Owns(&appsv1.Deployment{}).		Complete(r)}

注意: 接下来的两个小节将解释控制器如何 watch 资源以及如何触发 reconcile 循环。如果想跳过此部分,可以查看此文档的《运行 Operator》章节,查看如何运行此 operator。

Controller watch 的资源

controllers/memcached_controller.go 中的 SetupWithManager() 函数指定了如何构建控制器以监视 CR 和该控制器拥有和管理的其他资源。

import (...appsv1"k8s.io/api/apps/v1"...)func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {return ctrl.NewControllerManagedBy(mgr).For(&cachev1alpha1.Memcached{}).Owns(&appsv1.Deployment{}).Complete(r)}

NewControllerManagedBy() 提供了一个控制器构建器,允许各种控制器的配置。

For(&cachev1alpha1.Memcached{})Memcached 类型指定为要监视的主要资源。对于每个 Memcached 类型的 Add/Update/Delete 事件,reconcile loop 将为该 Memcached 对象发送一个 reconcile Request(命名空间/key 名称)。

Owns(&appsv1.Deployment{}) 将 Deployments 类型指定为要 watch 的辅助资源。对于每个 Deployment 类型的添加/更新/删除事件,事件处理程序会将每个事件映射到部署所有者的 reconcile “请求”。在这种情况下,是为其创建 Deployment 的 Memcached 对象。

Controller 配置

在初始化控制器时可以进行许多其他有用的配置。有关这些配置的更多详细信息,可以查看上游 builder 和 controller 的帮助文档。

通过 MaxConcurrentReconciles 选项设置控制器的最大并发 Reconciles 数。默认为 1。func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {

return ctrl.NewControllerManagedBy(mgr).

For(&cachev1alpha1.Memcached{}).

Owns(&appsv1.Deployment{}).

WithOptions(controller.Options{MaxConcurrentReconciles: 2}).

Complete(r)

}使用 predicates() 过滤监视事件。选择 EventHandler ()的类型以更改监视事件将如何转换为 reconcile 请求以进行 reconcile 循环。对于比主次资源更复杂的 operator 关系,可以使用 EnqueueRequestsFromMapFunc)处理程序以将监视事件转换为任意一组 reconcile 请求。Reconcile loop

reconcile 函数负责在系统的实际状态上执行所需的 CR 状态。每次在监视的 CR 或资源上发生事件时,它都会运行,并将根据这些状态是否匹配并返回一些值。

这样,每个 Controller 都有一个 Reconciler 对象,该对象带有一个 Reconcile() 方法,用于实现 reconcile 循环。reconcile 循环传递了 Request参数,该参数是用于查找缓存中的主要资源对象 Memcached:

import (ctrl "sigs.k8s.io/controller-runtime"cachev1alpha1 "github.com/example/memcached-operator/api/v1alpha1"...)func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { // Lookup the Memcached instance for this reconcile request memcached := &cachev1alpha1.Memcached{} err := r.Get(ctx, req.NamespacedName, memcached) ...}

有关 Reconcilers、客户端以及与资源事件交互的指南,可以参考 客户端 API ()文档。以下是 Reconciler 的一些可能的返回选项:

发生错误时:return ctrl.Result{}, err没有错误时:return ctrl.Result{Requeue: true}, nil否则, 需要停止 Reconcile,如下:return ctrl.Result{}, nil在 X 时间之后,再次 Reconcile:return ctrl.Result{RequeueAfter: nextRun.Sub(r.Now())}, nil

想要获取更多详细信息,检查 Reconcile 及其文档 Reconcile godoc()。

指定权限及生成 RBAC 清单

controller 需要一定的 RBAC() 权限与其管理的资源进行交互。这些是通过 RBAC 标记() 指定的,如下代码所示:

//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/finalizers,verbs=update//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete//+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {  ...}

The ClusterRole manifest at config/rbac/role.yaml is generated from the above markers via controller-gen with the following command:

[root@blog memcached-operator]# go mod tidy # 如果不执行这一步,执行下面的命令会报错[root@blog memcached-operator]# make manifests/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
配置 operator 镜像

现在万事俱备,只欠东风了。剩下的就是构建 operator 镜像并将其推送到指定的镜像仓库上面。

在构建操 operator 镜像之前,请确保生成的 Dockerfile 引用了我们想要的基础镜像。我们可以通过将其标签替换为另一个标签(例如 alpine:latest)并删除 USER 65532:65532 指令来更改默认的 “runner” 镜像 gcr.io/distroless/static:nonroot。我们没有删除这些指令,而是注释了它们。修改完成之后如下:

# Build the manager binaryFROM golang:1.16 as builderWORKDIR /workspace# Copy the Go Modules manifestsCOPY go.mod go.modCOPY go.sum go.sum# cache deps before building and copying source so that we don't need to re-download as much# and so that source changes don't invalidate our downloaded layer# 修改了此行RUN export GOPROXY= && go mod download# Copy the go sourceCOPY main.go main.goCOPY api/ api/COPY controllers/ controllers/# BuildRUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o manager main.go# Use distroless as minimal base image to package the manager binary# Refer to  for more details# 注释了此行,并新增了 FROm alpine:latest# FROM gcr.io/distroless/static:nonrootFROM alpine:latestWORKDIR /COPY --from=builder /workspace/manager .# 注释了此行# USER 65532:65532ENTRYPOINT ["/manager"]

我们的 Makefile 由项目初始化时写入的值或命令行中的值组成镜像标签。特别是,IMAGE_TAG_BASE 变量允许我们为所有镜像标签定义一个通用的镜像仓库、命名空间和部分名称。如果当前值不正确,请将其更新到另一个镜像仓库或命名空间。之后,我们可以像下面这样更新 “IMG” 变量定义:

# 大概在 Makefile 文件的第 32 行,做如下修改,根据实际情况进行修改。# 这里使用的是我自己在 Docker Hub 上的命名空间 lavenliu# 大家可以根据实际情况进行修改成自己的IMAGE_TAG_BASE ?= lavenliu/memcached-operator# 大概在 Makefile 文件的第 40 行做如下操作# IMG ?= controller:latest  # 注释此行,并修改成如下的设置IMG ?= $(IMAGE_TAG_BASE):$(VERSION)

经过上述设置,我们就不用在命令行指定 IMG 环境变量了。下面的命令将会构建名为 lavenliu/memcached-operator 的镜像,标签为 v0.0.1,并推送到指定的仓库上面。

注意:在执行下面的命令之前,我们需要修改一下 Dockerfile,因为要在容器里面构建 Go 代码,所以我们需要设置以 Go 的代码,不然有些代码会拉取不成功。修改如下:

RUN go mod download 修改为 RUN export GOPROXY= && go mod download

在执行下面的命令之前,确保我们已经登录 Docker Hub:

[root@blog memcached-operator]# docker loginLogin with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to  to create one.Username: lavenliuPassword: WARNING! Your password will be stored unencrypted in /root/.docker/config.json.Configure a credential helper to remove this warning. See Succeeded

登录成功之后,在执行下面的命令:

[root@blog memcached-operator]# make docker-build docker-push/home/lcc/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases/home/lcc/memcached-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."go fmt ./...go vet ./...KUBEBUILDER_ASSETS="/root/.local/share/kubebuilder-envtest/k8s/1.22.1-linux-amd64" go test ./... -coverprofile cover.out?   	github.com/example/memcached-operator	[no test files]?   	github.com/example/memcached-operator/api/v1alpha1	[no test files]ok  	github.com/example/memcached-operator/controllers	8.846s	coverage: 0.0% of statementsdocker build -t example.com/memcached-operator:0.0.1 .Sending build context to Docker daemon  174.1kBStep 1/13 : FROM golang:1.16 as builder1.16: Pulling from library/golang......Status: Downloaded newer image for golang:1.16 ---> 8ffb179c0658Step 2/13 : WORKDIR /workspace ---> Running in f8bfa670f96fRemoving intermediate container f8bfa670f96f ---> 98c265863c39Step 3/13 : COPY go.mod go.mod ---> ef86cd6e1e92Step 4/13 : COPY go.sum go.sum ---> a43ba540fe13Step 5/13 : RUN export GOPROXY= && go mod download......Successfully built 5e9931cceaf9Successfully tagged lavenliu/memcached-operator:0.0.1docker push lavenliu/memcached-operator:0.0.1The push refers to repository [docker.io/lavenliu/memcached-operator]......

查看本地镜像及远端镜像是否存在:

查看 Docker Hub 上的镜像是否存在:

如果执行成功,我们自定义的 Operator 镜像会被推送到我们指定的地方。

运行 Operator

我们将以下面两种形式运行 Operator

以 Go 代码的形式在集群之外运行Deployment 的形式在 Kubernetes 集群中运行1、在本地(集群之外)运行

以下步骤将展示如何在集群上部署 operator。但是,要在本地运行以用于开发目的并在集群外部运行,请使用 makefile 的 target “make install run”。或者分开使用也行。如:先运行 make install,之后再运行 make run 也是可以的。先运行 make install

[root@blog memcached-operator]# make install/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/basesgo: creating new go.mod: module tmpDownloading sigs.k8s.io/kustomize/kustomize/v3@v3.8.7  # 下载了 kustomize 文件go: downloading sigs.k8s.io/kustomize/kustomize/v3 v3.8.7......go: downloading golang.org/x/net v0.0.0-20200625001655-4c5254603344......go get: added sigs.k8s.io/kustomize/kustomize/v3 v3.8.7/root/memcached-operator/bin/kustomize build config/crd | kubectl apply -f -customresourcedefinition.apiextensions.k8s.io/memcacheds.cache.example.com created

接着再运行 make run

[root@blog memcached-operator]# make run/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases/root/memcached-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."go fmt ./...go vet ./...go run ./main.goI0331 17:58:25.054157 4005942 request.go:665] Waited for 1.02972392s due to client-side throttling, not priority and fairness, request: GET: server is starting to listen{"addr": ":8080"}2022-03-31T17:58:25.605+0800INFOsetupstarting manager2022-03-31T17:58:25.605+0800INFOstarting metrics server{"path": "/metrics"}2022-03-31T17:58:25.605+0800INFOcontroller.memcachedStarting EventSource{"reconciler group": "cache.example.com", "reconciler kind": "Memcached", "source": "kind source: /, Kind="}2022-03-31T17:58:25.605+0800INFOcontroller.memcachedStarting EventSource{"reconciler group": "cache.example.com", "reconciler kind": "Memcached", "source": "kind source: /, Kind="}2022-03-31T17:58:25.605+0800INFOcontroller.memcachedStarting Controller{"reconciler group": "cache.example.com", "reconciler kind": "Memcached"}2022-03-31T17:58:25.707+0800INFOcontroller.memcachedStarting workers{"reconciler group": "cache.example.com", "reconciler kind": "Memcached", "worker count": 1}

查看 CRD

[root@blog memcached-operator]# kubectl get crd |grep memmemcacheds.cache.example.com                           2022-03-31T09:57:16Z
创建一个 Memcached 测试 CR

更新 Memcached CR 的配置文件 config/samples/cache_v1alpha1_memcached.yaml 并定义如下配置:

# 之前的样板文件[root@blog memcached-operator]# cat config/samples/cache_v1alpha1_memcached.yamlapiVersion: cache.my.domain/v1alpha1kind: Memcachedmetadata:name: memcached-samplespec: # TODO(user): Add fields here# 修改之后的配置文件[root@blog memcached-operator]# cat config/samples/cache_v1alpha1_memcached.yamlapiVersion: cache.example.com/v1alpha1kind: Memcachedmetadata:name: memcached-samplespec:size: 3

创建上述 CR:

# 没有创建之前的 pod 情况[root@blog memcached-operator]# kubectl -n liucc-test get deploymentNAME                          READY   UP-TO-DATE   AVAILABLE   AGEhello-spring-svc-deployment   1/1     1            1           79djson-spring-svc-deployment    1/1     1            1           72dmemcached-sample              3/3     3            3           25sworld-spring-svc-deployment   1/1     1            1           78d[root@blog memcached-operator]# kubectl -n liucc-test apply -f config/samples/cache_v1alpha1_memcached.yamlmemcached.cache.example.com/memcached-sample created

确保 memcached operator 是否创建了相应的 deployment 及相应的数量。再次执行检查的命令:

[root@blog memcached-operator]# kubectl -n liucc-test get podsNAME                                           READY   STATUS    RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running   0          78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running   0          71dmemcached-sample-6c765df685-dtmqg              1/1     Running   0          51smemcached-sample-6c765df685-nl8ks              1/1     Running   0          51smemcached-sample-6c765df685-vpmfq              1/1     Running   0          51sworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running   0          78d[root@blog memcached-operator]# kubectl -n liucc-test get memcached/memcached-sample -o yamlapiVersion: cache.example.com/v1alpha1kind: Memcachedmetadata:  annotations:    kubectl.kubernetes.io/last-applied-configuration: |      {"apiVersion":"cache.example.com/v1alpha1","kind":"Memcached","metadata":{"annotations":{},"name":"memcached-sample","namespace":"liucc-test"},"spec":{"size":3}}  creationTimestamp: "2022-03-31T10:26:00Z"  generation: 1  managedFields:  - apiVersion: cache.example.com/v1alpha1    fieldsType: FieldsV1    fieldsV1:      f:metadata:        f:annotations:          .: {}          f:kubectl.kubernetes.io/last-applied-configuration: {}      f:spec:        .: {}        f:size: {}    manager: kubectl-client-side-apply    operation: Update    time: "2022-03-31T10:26:00Z"  - apiVersion: cache.example.com/v1alpha1    fieldsType: FieldsV1    fieldsV1:      f:status:        .: {}        f:nodes: {}    manager: main    operation: Update    time: "2022-03-31T10:26:01Z"  name: memcached-sample  namespace: liucc-test  resourceVersion: "59357587"  uid: ef2efd68-c31b-4a9e-8795-6d61975e48ddspec:  size: 3status:  nodes:  - memcached-sample-6c765df685-vpmfq  - memcached-sample-6c765df685-dtmqg  - memcached-sample-6c765df685-nl8ks
更新 pod 数量

接下来更新 config/samples/cache_v1alpha1_memcached.yaml 文件中的 spec.size,由 3 改为 5,并进行验证:

[root@blog memcached-operator]# kubectl -n liucc-test patch memcached memcached-sample -p '{"spec":{"size": 5}}' --type=mergememcached.cache.example.com/memcached-sample patched[root@blog memcached-operator]# kubectl -n liucc-test get poNAME                                           READY   STATUS    RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running   0          78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running   0          71dmemcached-sample-6c765df685-65wc8              1/1     Running   0          90smemcached-sample-6c765df685-7r45t              1/1     Running   0          90smemcached-sample-6c765df685-llhz2              1/1     Running   0          90smemcached-sample-6c765df685-n6wsk              1/1     Running   0          5smemcached-sample-6c765df685-nwkjv              1/1     Running   0          5sworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running   0          78d

再次查看 deployment:

[root@blog memcached-operator]# kubectl -n liucc-test get deploymentNAME                          READY   UP-TO-DATE   AVAILABLE   AGEhello-spring-svc-deployment   1/1     1            1           79djson-spring-svc-deployment    1/1     1            1           72dmemcached-sample              5/5     5            5           2m27sworld-spring-svc-deployment   1/1     1            1           78d
清理环境

我们可以运行下面的命令,清理已经部署的资源:

[root@blog memcached-operator]# kubectl -n liucc-test delete -f config/samples/cache_v1alpha1_memcached.yamlmemcached.cache.example.com "memcached-sample" deleted# 验证 deployment 是否被删除[root@blog memcached-operator]# kubectl -n liucc-test get deploymentNAME                          READY   UP-TO-DATE   AVAILABLE   AGEhello-spring-svc-deployment   1/1     1            1           79djson-spring-svc-deployment    1/1     1            1           72dworld-spring-svc-deployment   1/1     1            1           78d# 验证 pod 是否被删除[root@blog memcached-operator]# kubectl -n liucc-test get poNAME                                           READY   STATUS        RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running       0          78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running       0          71dmemcached-sample-6c765df685-dtmqg              0/1     Terminating   0          3m5smemcached-sample-6c765df685-vmp6q              0/1     Terminating   0          60sworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running       0          78d
2、在集群内部以 Deployment 运行

默认情况下,会在 Kubernetes 集群上创建一个名为 <project-name>-system 的新命名空间,例如:memcached-operator-system。运行以下命令部署 operator,它还会从 config/rbac 清单文件安装 RBAC。

# 查看 make 帮助信息[root@blog memcached-operator]# make helpUsage:  make <target>General  help             Display this help.Development  manifests        Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.  generate         Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.  fmt              Run go fmt against code.  vet              Run go vet against code.  test             Run tests.Build  build            Build manager binary.  run              Run a controller from your host.  docker-build     Build docker image with the manager.  docker-push      Push docker image with the manager.Deployment  install          Install CRDs into the K8s cluster specified in ~/.kube/config.  uninstall        Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.  deploy           Deploy controller to the K8s cluster specified in ~/.kube/config.  undeploy         Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.  controller-gen   Download controller-gen locally if necessary.  kustomize        Download kustomize locally if necessary.  envtest          Download envtest-setup locally if necessary.  bundle           Generate bundle manifests and metadata, then validate generated files.  bundle-build     Build the bundle image.  bundle-push      Push the bundle image.  opm              Download opm locally if necessary.  catalog-build    Build a catalog image.  catalog-push     Push a catalog image.

在部署之前,我们需要修改一下 Makefile 文件,修改镜像的地址:

# 大概在 33 行IMAGE_TAG_BASE ?= <修改为一个有效的地址>/memcached-operator

接着进行部署:

[root@blog memcached-operator]# make deploy/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/basescd config/manager && /root/memcached-operator/bin/kustomize edit set image controller=lavenliu/memcached-operator:0.0.1/root/memcached-operator/bin/kustomize build config/default | kubectl apply -f -namespace/memcached-operator-system createdcustomresourcedefinition.apiextensions.k8s.io/memcacheds.cache.example.com configuredserviceaccount/memcached-operator-controller-manager createdrole.rbac.authorization.k8s.io/memcached-operator-leader-election-role createdclusterrole.rbac.authorization.k8s.io/memcached-operator-manager-role createdclusterrole.rbac.authorization.k8s.io/memcached-operator-metrics-reader createdclusterrole.rbac.authorization.k8s.io/memcached-operator-proxy-role createdrolebinding.rbac.authorization.k8s.io/memcached-operator-leader-election-rolebinding createdclusterrolebinding.rbac.authorization.k8s.io/memcached-operator-manager-rolebinding createdclusterrolebinding.rbac.authorization.k8s.io/memcached-operator-proxy-rolebinding createdconfigmap/memcached-operator-manager-config createdservice/memcached-operator-controller-manager-metrics-service createddeployment.apps/memcached-operator-controller-manager created[root@blog memcached-operator]# echo $?0

验证 memcached-operator 是否启动成功:

[root@blog memcached-operator]# kubectl -n memcached-operator-system get deploymentNAME                                    READY   UP-TO-DATE   AVAILABLE   AGEmemcached-operator-controller-manager   0/1     1            0           50s[root@blog memcached-operator]# kubectl -n memcached-operator-system get poNAME                                                     READY   STATUS              RESTARTS   AGEmemcached-operator-controller-manager-7bbc46698f-wsqvp   0/2     ContainerCreating   0          50s

报错了,主要原因是拉取镜像失败及运行容器时也失败了。查看一下错误原因:

[root@blog memcached-operator]# kubectl -n memcached-operator-system describe po memcached-operator-controller-manager-7bbc46698f-wsqvp......Events:Type     Reason     Age                           From               Message ----     ------     ----                           ----               -------Normal   Scheduled 59s                           default-scheduler Successfully assigned memcached-operator-system/memcached-operator-controller-manager-7bbc46698f-wsqvp to node03.lavenliu.cnNormal   Pulling   <invalid>                     kubelet           Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"Normal   Pulled     <invalid>                     kubelet           Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 15.593866868sNormal   Created   <invalid>                     kubelet           Created container kube-rbac-proxyNormal   Started   <invalid>                     kubelet           Started container kube-rbac-proxyNormal   Pulling   <invalid>                     kubelet           Pulling image "lavenliu/memcached-operator:0.0.1"Normal   Pulled     <invalid>                     kubelet           Successfully pulled image "lavenliu/memcached-operator:0.0.1" in 19.848521963sWarning Failed     <invalid> (x4 over <invalid>) kubelet           Error: container has runAsNonRoot and image will run as root (pod: "memcached-operator-controller-manager-7bbc46698f-wsqvp_memcached-operator-system(20bd3c70-7400-4497-904e-325122b364db)", container: manager)Normal   Pulled     <invalid> (x3 over <invalid>) kubelet           Container image "lavenliu/memcached-operator:0.0.1" already present on machine

文件做如下修改:

[root@blog memcached-operator]# vim config/default/manager_auth_proxy_patch.yamlspec:template:  spec:    securityContext:    # 新增此行      runAsUser: 1000   # 新增此行    containers:     - name: kube-rbac-proxy       # image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0 # 镜像拉取不成功,所以注释了此行       # 在 Docker HUB 上面找到了如下的镜像可以正常使用      image: rancher/kube-rbac-proxy:v0.5.0  # 新增此行

修改完成之后,执行 make undeloy 命令:

[root@blog memcached-operator]# make undeploy

然后再次执行 make deploy 命令:

[root@blog memcached-operator]# make deploy

最后查看 deployment 及 pod 信息:

[root@blog memcached-operator]# kubectl -n memcached-operator-system get deploymentNAME                                   READY   UP-TO-DATE   AVAILABLE   AGEmemcached-operator-controller-manager   1/1     1            1           4m[root@blog memcached-operator]# kubectl -n memcached-operator-system get poNAME                                                     READY   STATUS   RESTARTS   AGEmemcached-operator-controller-manager-54548dbf4d-drnhp   2/2     Running   0         4m15s[root@blog memcached-operator]# kubectl -n memcached-operator-system describe po memcached-operator-controller-manager-54548dbf4d-drnhp......Events:Type   Reason         Age   From               Message ----    ------          ----   ----               -------Normal Scheduled       4m26s default-scheduler Successfully assigned memcached-operator-system/memcached-operator-controller-manager-54548dbf4d-drnhp to cn-shanghai.10.10.11.13Normal AllocIPSucceed 4m26s terway-daemon     Alloc IP 10.10.108.242/32 for PodNormal Pulled         4m26s kubelet           Container image "rancher/kube-rbac-proxy:v0.5.0" already present on machineNormal Created         4m26s kubelet           Created container kube-rbac-proxyNormal Started         4m26s kubelet           Started container kube-rbac-proxyNormal Pulled         4m26s kubelet           Container image "harbor.lavenliu.cn/library/memcached-operator:0.0.1" already present on machineNormal Created         4m26s kubelet           Created container managerNormal Started         4m26s kubelet           Started container manager

控制器部署完成,接着就部署实例:

[root@blog memcached-operator]# kubectl -n liucc-test apply -f config/samples/cache_v1alpha1_memcached.yaml

查看 pod 信息:

[root@blog memcached-operator]# kubectl -n liucc-test get poNAME                                           READY   STATUS   RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running   0         78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running   0         71dmemcached-sample-6c765df685-7xrn2              1/1     Running   0         6m15smemcached-sample-6c765df685-tdjz8              1/1     Running   0         6m15smemcached-sample-6c765df685-xvpqk              1/1     Running   0         6m15sworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running   0         78d

修改 pod 的数量为 5 个:

[root@blog memcached-operator]# kubectl -n liucc-test patch memcached memcached-sample -p '{"spec":{"size": 5}}' --type=mergememcached.cache.example.com/memcached-sample patched# 再次查看[root@blog memcached-operator]# kubectl -n liucc-test get poNAME                                           READY   STATUS    RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running   0          78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running   0          71dmemcached-sample-6c765df685-7xrn2              1/1     Running   0          7m27smemcached-sample-6c765df685-tdjz8              1/1     Running   0          7m27smemcached-sample-6c765df685-xvpqk              1/1     Running   0          7m27smemcached-sample-6c765df685-b79kc              1/1     Running   0          3s  # 新起的 podmemcached-sample-6c765df685-qkxvc              1/1     Running   0          3s  # 新起的 podworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running   0          78d

清除演示环境:

[root@blog memcached-operator]# make undeploy/root/memcached-operator/bin/kustomize build config/default | kubectl delete --ignore-not-found=false -f -namespace "memcached-operator-system" deletedcustomresourcedefinition.apiextensions.k8s.io "memcacheds.cache.example.com" deletedserviceaccount "memcached-operator-controller-manager" deletedrole.rbac.authorization.k8s.io "memcached-operator-leader-election-role" deletedclusterrole.rbac.authorization.k8s.io "memcached-operator-manager-role" deletedclusterrole.rbac.authorization.k8s.io "memcached-operator-metrics-reader" deletedclusterrole.rbac.authorization.k8s.io "memcached-operator-proxy-role" deletedrolebinding.rbac.authorization.k8s.io "memcached-operator-leader-election-rolebinding" deletedclusterrolebinding.rbac.authorization.k8s.io "memcached-operator-manager-rolebinding" deletedclusterrolebinding.rbac.authorization.k8s.io "memcached-operator-proxy-rolebinding" deletedconfigmap "memcached-operator-manager-config" deletedservice "memcached-operator-controller-manager-metrics-service" deleteddeployment.apps "memcached-operator-controller-manager" deleted

验证 memcached 的 pod 是否还在:

[root@blog memcached-operator]# kubectl -n liucc-test get poNAME                                           READY   STATUS    RESTARTS   AGEhello-spring-svc-deployment-694b8cb9d4-4sjl5   1/1     Running   0          78djson-spring-svc-deployment-cf88f85c8-rsqdc     1/1     Running   0          71dworld-spring-svc-deployment-84b78bc8d4-tvdz8   1/1     Running   0          78d
3、使用 OLM 部署 Operator

首先我们需要安装 OLM:

[root@blog memcached-operator]# operator-sdk olm installI0403 20:06:37.667540 1957628 request.go:665] Waited for 1.028361355s due to client-side throttling, not priority and fairness, request: GET:[0001] Fetching CRDs for version "latest"          INFO[0001] Fetching resources for resolved version "latest"I0403 20:07:02.327494 1957628 request.go:665] Waited for 1.04307318s due to client-side throttling, not priority and fairness, request: GET: 20:07:13.324530 1957628 request.go:665] Waited for 1.045662694s due to client-side throttling, not priority and fairness, request: GET:[0038] Creating CRDs and resources                  INFO[0038]   Creating CustomResourceDefinition "catalogsources.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "clusterserviceversions.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "installplans.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "olmconfigs.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "operatorconditions.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "operatorgroups.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "operators.operators.coreos.com"INFO[0038]   Creating CustomResourceDefinition "subscriptions.operators.coreos.com"INFO[0038]   Creating Namespace "olm"                  INFO[0038]   Creating Namespace "operators"            INFO[0039]   Creating ServiceAccount "olm/olm-operator-serviceaccount"INFO[0039]   Creating ClusterRole "system:controller:operator-lifecycle-manager"INFO[0039]   Creating ClusterRoleBinding "olm-operator-binding-olm"INFO[0039]   Creating OLMConfig "cluster"              INFO[0042]   Creating Deployment "olm/olm-operator"    INFO[0042]   Creating Deployment "olm/catalog-operator"INFO[0042]   Creating ClusterRole "aggregate-olm-edit"  INFO[0042]   Creating ClusterRole "aggregate-olm-view"  INFO[0042]   Creating OperatorGroup "operators/global-operators"INFO[0042]   Creating OperatorGroup "olm/olm-operators"INFO[0042]   Creating ClusterServiceVersion "olm/packageserver"INFO[0042]   Creating CatalogSource "olm/operatorhubio-catalog"INFO[0042] Waiting for deployment/olm-operator rollout to completeINFO[0042]   Waiting for Deployment "olm/olm-operator" to rollout: 0 of 1 updated replicas are availableINFO[0063]   Deployment "olm/olm-operator" successfully rolled outINFO[0063] Waiting for deployment/catalog-operator rollout to completeINFO[0063]   Deployment "olm/catalog-operator" successfully rolled outINFO[0063] Waiting for deployment/packageserver rollout to completeINFO[0063]   Waiting for Deployment "olm/packageserver" to rollout: 0 of 2 updated replicas are availableINFO[0079]   Deployment "olm/packageserver" successfully rolled outINFO[0079] Successfully installed OLM version "latest"  NAME                                           NAMESPACE   KIND                       STATUScatalogsources.operators.coreos.com                         CustomResourceDefinition   Installedclusterserviceversions.operators.coreos.com                 CustomResourceDefinition   Installedinstallplans.operators.coreos.com                           CustomResourceDefinition   Installedolmconfigs.operators.coreos.com                             CustomResourceDefinition   Installedoperatorconditions.operators.coreos.com                     CustomResourceDefinition   Installedoperatorgroups.operators.coreos.com                         CustomResourceDefinition   Installedoperators.operators.coreos.com                               CustomResourceDefinition   Installedsubscriptions.operators.coreos.com                           CustomResourceDefinition   Installedolm                                                         Namespace                   Installedoperators                                                   Namespace                   Installedolm-operator-serviceaccount                     olm         ServiceAccount             Installedsystem:controller:operator-lifecycle-manager                 ClusterRole                 Installedolm-operator-binding-olm                                     ClusterRoleBinding         Installedcluster                                                     OLMConfig                   Installedolm-operator                                   olm         Deployment                 Installedcatalog-operator                               olm         Deployment                 Installedaggregate-olm-edit                                           ClusterRole                 Installedaggregate-olm-view                                           ClusterRole                 Installedglobal-operators                               operators   OperatorGroup               Installedolm-operators                                   olm         OperatorGroup               Installedpackageserver                                   olm         ClusterServiceVersion       Installedoperatorhubio-catalog                           olm         CatalogSource               Installed

如果安装失败,再次进行安装时,需要先卸载,然后再安装:operator-sdk olm uninstall

安装完成可以查看一下状态:

[root@blog memcached-operator]# operator-sdk olm statusI0406 11:28:17.359874   30074 request.go:665] Waited for 1.041471802s due to client-side throttling, not priority and fairness, request: GET:[0002] Fetching CRDs for version "v0.20.0"          INFO[0002] Fetching resources for resolved version "v0.20.0"INFO[0007] Successfully got OLM status for version "v0.20.0"NAME                                           NAMESPACE   KIND                       STATUSoperatorgroups.operators.coreos.com                         CustomResourceDefinition   Installedoperatorconditions.operators.coreos.com                     CustomResourceDefinition   Installedolmconfigs.operators.coreos.com                             CustomResourceDefinition   Installedinstallplans.operators.coreos.com                           CustomResourceDefinition   Installedclusterserviceversions.operators.coreos.com                 CustomResourceDefinition   Installedolm-operator-binding-olm                                     ClusterRoleBinding         Installedoperatorhubio-catalog                           olm         CatalogSource               Installedolm-operators                                   olm         OperatorGroup               Installedaggregate-olm-view                                           ClusterRole                 Installedcatalog-operator                               olm         Deployment                 Installedcluster                                                     OLMConfig                   Installedoperators.operators.coreos.com                               CustomResourceDefinition   Installedolm-operator                                   olm         Deployment                 Installedsubscriptions.operators.coreos.com                           CustomResourceDefinition   Installedaggregate-olm-edit                                           ClusterRole                 Installedolm                                                         Namespace                   Installedglobal-operators                               operators   OperatorGroup               Installedoperators                                                   Namespace                   Installedpackageserver                                   olm         ClusterServiceVersion       Installedolm-operator-serviceaccount                     olm         ServiceAccount             Installedcatalogsources.operators.coreos.com                         CustomResourceDefinition   Installedsystem:controller:operator-lifecycle-manager                 ClusterRole                 Installed

我们看一下中间生成了哪些命名空间及 POD:

[root@blog memcached-operator]# kubectl get nsNAME                       STATUS       AGEdefault                     Active       185dkube-node-lease             Active       185dkube-public                 Active       185dkube-system                 Active       185dliucc-test                 Active       7h43molm                         Active       103moperators                   Active       103m[root@blog memcached-operator]# kubectl -n olm get poNAME                               READY   STATUS   RESTARTS   AGEcatalog-operator-5c4997c789-tk5cq   1/1     Running   0         103molm-operator-6d46969488-rc8zf       1/1     Running   0         103moperatorhubio-catalog-nt5vw         1/1     Running   0         103mpackageserver-848bdb76dd-6snj4      1/1     Running   0         103mpackageserver-848bdb76dd-grdx4      1/1     Running   0         103m

接下来对我们的 Operator 进行打包,然后构建并推送包镜像。bundle 目标在 bundle 目录包含定义了我们的 operator 清单和元数据。bundle-buildbundle-push 两个目标将会构建和推送由 bundle.Dockerfile 文件定义的包镜像。

[root@blog memcached-operator]# make bundle/root/memcached-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/basesoperator-sdk generate kustomize manifests -qDisplay name for the operator (required): # 需要填写如下信息> memcached-operatorDescription for the operator (required): # 需要填写如下信息> memcached-operatorProvider's name for the operator (required): # 需要填写如下信息> lavenliuAny relevant URL for the provider name (optional):>Comma-separated list of keywords for your operator (required): # 需要填写如下信息> memcached,operatorComma-separated list of maintainers and their emails (e.g. 'name1:email1, name2:email2') (required): # 需要填写如下信息> lcc@163.comcd config/manager && /root/memcached-operator/bin/kustomize edit set image controller=lavenliu/memcached-operator:0.0.1/root/memcached-operator/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 0.0.1  INFO[0000] Creating bundle/metadata/annotations.yaml    INFO[0000] Creating bundle.Dockerfile                  INFO[0000] Bundle metadata generated suceessfully      operator-sdk bundle validate ./bundleINFO[0000] All validation tests have completed successfully

我们看看是否有新的文件或目录产生:

[root@blog memcached-operator]# ll -ttotal 136drwxr-xr-x  5 root root  4096 Apr  3 20:11 bundle            # 新产生的目录-rw-r--r--  1 root root   923 Apr  3 20:11 bundle.Dockerfile # 新产生的文件drwx------  2 root root  4096 Apr  3 19:15 hackdrwxr-xr-x  2 root root  4096 Apr  3 15:41 bindrwx------  2 root root  4096 Apr  3 15:37 controllers-rw-------  1 root root  9560 Mar 31 19:03 Makefile-rw-r--r--  1 root root  2361 Mar 31 16:49 cover.out-rw-------  1 root root   780 Mar 31 16:46 Dockerfile-rw-------  1 root root   246 Mar 31 16:20 go.mod-rw-------  1 root root  3192 Mar 31 14:39 main.go-rw-------  1 root root   448 Mar 31 14:39 PROJECTdrwx------  3 root root  4096 Mar 31 14:39 apidrwx------ 10 root root  4096 Mar 31 14:39 config-rw-r--r--  1 root root 77000 Mar 31 14:37 go.sum

接着运行 make bundle-build 目标:

[root@blog memcached-operator]# make bundle-builddocker build -f bundle.Dockerfile -t lavenliu/memcached-operator-bundle:v0.0.1 .Sending build context to Docker daemon  196.6kBStep 1/14 : FROM scratch--->Step 2/14 : LABEL operators.operatorframework.io.bundle.mediatype.v1=registry+v1---> Running in acbee848ff30Removing intermediate container acbee848ff30---> 7840efc1acc7Step 3/14 : LABEL operators.operatorframework.io.bundle.manifests.v1=manifests/---> Running in 795bbd68aa0bRemoving intermediate container 795bbd68aa0b---> 58c64b2c5bedStep 4/14 : LABEL operators.operatorframework.io.bundle.metadata.v1=metadata/---> Running in 3ba74dd41232Removing intermediate container 3ba74dd41232---> 4921148bcd53Step 5/14 : LABEL operators.operatorframework.io.bundle.package.v1=memcached-operator---> Running in c8ae15420ea2Removing intermediate container c8ae15420ea2---> 0c417435ceffStep 6/14 : LABEL operators.operatorframework.io.bundle.channels.v1=alpha---> Running in a4c7d20e793bRemoving intermediate container a4c7d20e793b---> b549d7f0aa94Step 7/14 : LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.15.0+git---> Running in 3dd418069c6bRemoving intermediate container 3dd418069c6b---> a0ead127d313Step 8/14 : LABEL operators.operatorframework.io.metrics.mediatype.v1=metrics+v1---> Running in 513a0223cafbRemoving intermediate container 513a0223cafb---> 97c961869eb3Step 9/14 : LABEL operators.operatorframework.io.metrics.project_layout=go.kubebuilder.io/v3---> Running in 24c6fbf30e77Removing intermediate container 24c6fbf30e77---> e523f8b86a47Step 10/14 : LABEL operators.operatorframework.io.test.mediatype.v1=scorecard+v1---> Running in 795730b93b89Removing intermediate container 795730b93b89---> 5f186fccf6fbStep 11/14 : LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/---> Running in d29664ae092aRemoving intermediate container d29664ae092a---> 7776ff18f767Step 12/14 : COPY bundle/manifests /manifests/---> cde9d176b798Step 13/14 : COPY bundle/metadata /metadata/---> 38d589cdd086Step 14/14 : COPY bundle/tests/scorecard /tests/scorecard/---> 976c6344511bSuccessfully built 976c6344511bSuccessfully tagged lavenliu/memcached-operator-bundle:v0.0.1

再运行 make bundle-push 目标:

[root@blog memcached-operator]# make bundle-pushmake docker-push IMG=lavenliu/memcached-operator-bundle:v0.0.1make[1]: Entering directory `/root/memcached-operator'docker push lavenliu/memcached-operator-bundle:v0.0.1The push refers to repository [docker.io/lavenliu/memcached-operator-bundle]36632daec064: Layer already existsca08711083d4: Layer already exists8b7611a97ff6: Layer already existsv0.0.1: digest: sha256:9adbb5b9e2aede9108f9bba509dc8ca9aa0ed4aad0de6ad37cc8cb4eaa3b6c79 size: 939make[1]: Leaving directory `/root/memcached-operator'

最后,运行我们的包。如果我们的包镜像托管在私有镜像仓库中或具有自定义 CA,则可以参考这些 配置步骤- and-catalog-image-registries)。

[root@blog memcached-operator]# operator-sdk run bundle lavenliu/memcached-operator-bundle:v0.0.1

查看 docs() 深入了解 operator-sdk 的 OLM 集成。

总结

经过前面几章的 “折腾”,我们终于完成了一个 Operator 的 tutorial。虽然是按照官方文档进行一步一步的操作,但是中间过程还是挺曲折的。希望本文可以帮助到大家,对编写 Operator 起到参考的作用。

附录推荐阅读Introduction - The Kubebuilder Book( )Operator SDK (operatorframework.io)()《Kubernetes Operators》,这本书使用的 operator-sdk 版本比较旧,但是里面的讲解还是非常不错的CoreOS 关于 Operator 的介绍在 OperatorHub.io() 上找到现成的、适合你的 Operator

标签: #ubuntu90464位下载