一、环境准备
kubernetes 二进制安装
1. 集群信息
系统:CentOS Linux release 7.6.1810 (Core)
内核:3.10.0-957.27.2.el7.x86_64
机器和服务分部,master 也计划安装上 kubelet、kube-proxy、docker、flannel 这些
主机名
IP
集群角色
服务
master-01
172.16.10.20
master
etcd、Apiserver、ControllerManager、Scheduler docker、flannel、kube-proxy、kubelet
worker-01
172.16.10.25
node
etcd、kubelet、kube-proxy、docker、flannel
worker-02
172.16.10.26
node
etcd、kubelet、kube-proxy、docker、flannel
kubernetes 集群规划
配置项
值
Cluster CIDR
10.66.0.0/24
容器网段( flannel 大网段 )
10.99.0.0/16
2. 配置系统
请绑定主机名、配置 master-01 到其他机器的 ssh 等效性,集群机器都执行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 systemctl stop firewalld systemctl disable firewalld setenforce 0 sed -i '/^SELINUX=/s/enforcing/disabled/' /etc/selinux/config # 关闭swap,若有请注释fstab中的swap挂载配置 swapoff -a sysctl -w vm.swappiness=0 # 加载内核模块 modprobe br_netfilter # 安装docker-ce-18.09,用阿里云的源,先不启动,安装完flannel配置后再启动 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum install -y docker-ce-18.09.6-3.el7
3. 软件包
这里存放在 master-01 的 /repo 中共享给其他机器
二、创建证书 创建一些目录
1 2 3 4 5 6 7 mkdir -p ~/ssl/{k8s,etcd} mkdir -p /usr/local/kubernetes/bin mkdir -p /etc/kubernetes mkdir -p /etc/{etcd,kubernetes}/ssl echo 'export PATH=$PATH:/usr/local/kubernetes/bin/' >> /etc/profile source /etc/profile
下载 cfssl
1 2 3 4 5 6 7 wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 chmod +x cfssl_linux-amd64 cfssljson_linux-amd64 cfssl-certinfo_linux-amd64 mv cfssl_linux-amd64 /usr/local/bin/cfssl mv cfssljson_linux-amd64 /usr/local/bin/cfssljson mv cfssl-certinfo_linux-amd64 /usr/bin/cfssl-certinfo
1. 创建 etcd 证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 cd ~/ssl/etcd # 创建 etcd 证书 cat << EOF | tee ca-config.json { "signing": { "default": { "expiry": "87600h" }, "profiles": { "www": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } } # 创建 CA 配置文件 cat << EOF | tee ca-csr.json { "CN": "etcd CA", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Guangzhou", "ST": "Guangzhou" } ] } EOF # 创建 server 证书 cat << EOF | tee server-csr.json { "CN": "etcd", "hosts": [ "172.16.10.20", "172.16.10.25", "172.16.10.26" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Guangzhou", "ST": "Guangzhou" } ] } EOF # 生成 pem 证书 cfssl gencert -initca ca-csr.json | cfssljson -bare ca - cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=www server-csr.json | cfssljson -bare server
2. 创建 kubernetes 证书 进入准备的临时目录
2.1 生成 CA 证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 cat << EOF | tee ca-config.json { "signing": { "default": { "expiry": "87600h" }, "profiles": { "kubernetes": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } } EOF cat << EOF | tee ca-csr.json { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Guangzhou", "ST": "Guangzhou", "O": "k8s", "OU": "System" } ] } EOF # 生成证书 cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
2.2 生成 API Server 证书
10.66.0.1 是计划在Apiserver中指定的 service-cluster-ip-range
网段第一个 ip
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 cat << EOF | tee server-csr.json { "CN": "kubernetes", "hosts": [ "127.0.0.1", "10.66.0.1", "172.16.10.20", "172.16.10.25", "172.16.10.26", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Guangzhou", "ST": "Guangzhou", "O": "k8s", "OU": "System" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server
2.3 创建 Kube Proxy 证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 cat << EOF | tee kube-proxy-csr.json { "CN": "system:kube-proxy", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Guangzhou", "ST": "Guangzhou", "O": "k8s", "OU": "System" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
2.4 创建 Admin 证书 客户端连接 apiserver 的证书
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 cat << EOF | tee admin-csr.json { "CN": "admin", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Guangzhou", "L": "Guangzhou", "O": "system:masters", "OU": "System" } ] } # 生成证书 cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin
三、安装 etcd
集群中其他节点除配置文件外内容一致
1. 准备 解压 etcd 二进制文件到 $PATH
中,复制证书到准备的目录
1 2 3 4 5 6 7 cp ~/ssl/etcd/*pem /etc/etcd/ssl cd /repo tar xf etcd-v3.3.15-linux-amd64.tar.gz # 其他节点记得复制并创建 etcd 目录 cp -a etcd-v3.3.15-linux-amd64/etcd* /usr/local/kubernetes/bin/ mkdir -p /var/lib/etcd/
2. 修改配置 配置文件 /etc/etcd/etcd.conf
,其他节点除 ETCD_INITIAL_CLUSTER
外的 IP 需要改成自己的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 cat /etc/etcd/etcd.conf ETCD_DATA_DIR ="/var/lib/etcd/" ETCD_LISTEN_PEER_URLS ="https://172.16.10.20:2380" ETCD_LISTEN_CLIENT_URLS ="https://172.16.10.20:2379,http://127.0.0.1:2379" ETCD_NAME ="infra1" ETCD_INITIAL_ADVERTISE_PEER_URLS ="https://172.16.10.20:2380" ETCD_ADVERTISE_CLIENT_URLS ="https://172.16.10.20:2379" ETCD_INITIAL_CLUSTER ="infra1=https://172.16.10.20:2380,infra2=https://172.16.10.25:2380,infra3=https://172.16.10.26:2380" ETCD_INITIAL_CLUSTER_TOKEN ="etcd-cluster" ETCD_INITIAL_CLUSTER_STATE ="new" ETCD_CERT_FILE ="/etc/etcd/ssl/server.pem" ETCD_KEY_FILE ="/etc/etcd/ssl/server-key.pem" ETCD_TRUSTED_CA_FILE ="/etc/etcd/ssl/ca.pem" ETCD_PEER_CERT_FILE ="/etc/etcd/ssl/server.pem" ETCD_PEER_KEY_FILE ="/etc/etcd/ssl/server-key.pem" ETCD_PEER_TRUSTED_CA_FILE ="/etc/etcd/ssl/ca.pem"
配置 /usr/lib/systemd/system/etcd.service
启动文件,集群中其他节点也是一样
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 [Unit] Description =Etcd ServerAfter =network.targetAfter =network-on line.targetWants =network-on line.targetDocumentation =https://github.com/coreos[Service] Type =notifyWorkingDirectory =/var/lib/etcd/EnvironmentFile =-/etc/etcd/etcd.confExecStart =/bin/bash -c "GOMAXPROCS=$(nproc) /usr/local/kubernetes/bin/etcd \ --name=\"${ETCD_NAME}\" \ --cert-file=\"${ETCD_CERT_FILE}\" \ --key-file=\"${ETCD_KEY_FILE}\" \ --peer-cert-file=\"${ETCD_PEER_CERT_FILE}\" \ --peer-key-file=\"${ETCD_PEER_KEY_FILE}\" \ --trusted-ca-file=\"${ETCD_TRUSTED_CA_FILE}\" \ --peer-trusted-ca-file=\"${ETCD_PEER_TRUSTED_CA_FILE}\" \ --initial-advertise-peer-urls=\"${ETCD_INITIAL_ADVERTISE_PEER_URLS}\" \ --listen-peer-urls=\"${ETCD_LISTEN_PEER_URLS}\" \ --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\" \ --advertise-client-urls=\"${ETCD_ADVERTISE_CLIENT_URLS}\" \ --initial-cluster-token=\"${ETCD_INITIAL_CLUSTER_TOKEN}\" \ --initial-cluster=\"${ETCD_INITIAL_CLUSTER}\" \ --initial-cluster-state=\"${ETCD_INITIAL_CLUSTER_STATE}\" \ --data-dir=\"${ETCD_DATA_DIR}\"" Restart =on -failureRestartSec =5 LimitNOFILE =65536 [Install] WantedBy =multi-user.target
复制配置文件到集群其他节点
1 2 3 4 5 6 # 复制完需要修改配置文件 scp -r /etc/etcd worker-01:/etc/ scp -r /etc/etcd worker-02:/etc/ scp /usr/lib/systemd/system/etcd.service worker-01:/usr/lib/systemd/system/etcd.service scp /usr/lib/systemd/system/etcd.service worker-02:/usr/lib/systemd/system/etcd.service
3. 启动和测试 启动 etcd 服务(注意这里需要有两个节点一起启动,单机启动会卡住 )
1 2 3 systemctl daemon-reload systemctl enable etcd systemctl start etcd
测试 etcd 集群状态,因为配置了证书所以需要指定很多证书路径参数,为了后面调试可以 alias
一个别名
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 etcdctl \ --ca-file=/etc/etcd/ssl/ca.pem \ --cert-file=/etc/etcd/ssl/server.pem \ --key-file=/etc/etcd/ssl/server-key.pem \ --endpoints="https://172.16.10.20:2379,\ https://172.16.10.25:2379,\ https://172.16.10.26:2379" \ cluster-health # 正常输出 cluster is healthy member b15ab296f41fb90 is healthy: got healthy result from https://172.16.10.20:2379 member 421127a297dd866e is healthy: got healthy result from https://172.16.10.25:2379 member b7f5cc8d67090480 is healthy: got healthy result from https://172.16.10.26:2379 cluster is healthy # etcdctl别名加上参数 alias etcdctl='etcdctl \ --ca-file=/etc/etcd/ssl/ca.pem \ --cert-file=/etc/etcd/ssl/server.pem \ --key-file=/etc/etcd/ssl/server-key.pem \ --endpoints="https://172.16.10.20:2379,\ https://172.16.10.25:2379,\ https://172.16.10.26:2379" ' # 再次尝试,后面有其他步骤用 etcdctl 命令配置数据时,将直接用这种方式 etcdctl cluster-health
有上面的 cluster is healthy
说明我们的 etcd 集群就完成了
四、安装 flannel
worker 节点都安装 flannel ,它在集群中的作用是让不同 docker host 之间的容器互相通信
1. 准备 解压软件包
1 2 3 4 5 6 7 cd /repo tar xf flannel-v0.11.0-linux-amd64.tar.gz cp -a flanneld mk-docker-opts.sh /usr/local/kubernetes/bin/ # 发送到 worker 节点 scp flanneld mk-docker-opts.sh worker-01:/usr/local/kubernetes/bin/ scp flanneld mk-docker-opts.sh worker-02:/usr/local/kubernetes/bin/
2. 修改配置
10.99.0.0/16 对应我们最开始规划的 Cluster IP,这个子网必须是 16 位地址
创建网络配置,所有的 flannel 节点会在 10.99.0.0/16 下创建一个 24 位的子网,作为本机的网段,将在后面提供给 docker 中运行的 Pod 使用
这里使用 host-gw
的方式,另外可以配置 vxlan
,可以详细了解它们的差异
1 2 3 # 创建网络配置,"/kubernetes/network/config" 将在 flannel 配置文件中用到 etcdctl mk /kubernetes/network/config \ '{"Network":"10.99.0.0/16","SubnetLen":24,"Backend":{"Type":"host-gw"}}'
修改 /etc/sysconfig/flanneld
配置文件,指定 etcd 地址和证书位置
如果是多网卡环境,则需要在 FLANNEL_OPTIONS
中增加指定的外网出口的网卡,例如 -iface=ens33
1 2 3 4 5 6 7 8 9 10 FLANNEL_ETCD_ENDPOINTS ="https://172.16.10.20:2379,https://172.16.10.25:2379,https://172.16.10.26:2379" FLANNEL_ETCD_PREFIX ="/kubernetes/network" FLANNEL_OPTIONS ="-etcd-cafile=/etc/etcd/ssl/ca.pem -etcd-certfile=/etc/etcd/ssl/server.pem -etcd-keyfile=/etc/etcd/ssl/server-key.pem"
配置 /usr/lib/systemd/system/flanneld.service
启动文件
flannel 启动后将执行 mk-docker-opts.sh
把网段信息写入到 cat /run/flannel/docker
中,后面docker启动的时候会按文件中的变量配置 docker0 网桥
运行需要 root 权限
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [Unit] Description =Flanneld overlay address etcd agentAfter =network.targetAfter =network-on line.targetWants =network-on line.targetAfter =etcd.serviceBefore =docker.service[Service] Type =notifyEnvironmentFile =/etc/sysconfig/flanneldEnvironmentFile =-/etc/sysconfig/docker-networkExecStart =/usr/local/kubernetes/bin/flanneld -etcd-endpoints=${FLANNEL_ETCD_ENDPOINTS} \ -etcd-prefix =${FLANNEL_ETCD_PREFIX} $FLANNEL_OPTIONS ExecStartPost =/usr/local/kubernetes/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/dockerRestart =on -failure[Install] WantedBy =multi-user.targetWantedBy =docker.service
发送配置到 worker 节点
1 2 3 4 5 scp /etc/sysconfig/flanneld worker-01:/etc/sysconfig/ scp /etc/sysconfig/flanneld worker-02:/etc/sysconfig/ scp /usr/lib/systemd/system/flanneld.service worker-01:/usr/lib/systemd/system/ scp /usr/lib/systemd/system/flanneld.service worker-02:/usr/lib/systemd/system/
3. 启动和测试 启动所有节点中的 flannel 服务
1 2 3 systemctl daemon-reload systemctl start flanneld.service systemctl enable flanneld.service
再次看下 etcd 中的变化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 etcdctl ls /kubernetes/network/subnets # 输出如下,这里三个节点都安装有 flannel /kubernetes/network/subnets/10.99.76.0-24 /kubernetes/network/subnets/10.99.41.0-24 /kubernetes/network/subnets/10.99.59.0-24 # 这里再看 master-01 的两个 flannel 文件,和 etcd 中生成的对应,docker 服务将要用这里的变量 cat /run/flannel/subnet.env FLANNEL_NETWORK=10.99.0.0/16 FLANNEL_SUBNET=10.99.41.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=false cat /run/flannel/docker DOCKER_OPT_BIP="--bip=10.99.41.1/24" DOCKER_OPT_IPMASQ="--ip-masq=true" DOCKER_OPT_MTU="--mtu=1500" DOCKER_NETWORK_OPTIONS=" --bip=10.99.41.1/24 --ip-masq=true --mtu=1500"
4. 配置 docker 服务 前面已经准备了 flannel 给集群中的 docker host 提供网络通信,现在需要配置 docker 服务把 flannel 的网络配置应用上
--exec-opt
选项指定了 cgroup 的驱动用 systemd ,和后面 kubelet 需要一致
$DOCKER_NETWORK_OPTIONS
是在 /var/run/flannel/docker
文件中的配置
另外加了一个阿里云的加速地址 --registry-mirror
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 [Unit] Description =Docker Application Container EngineDocumentation =https://docs.docker.comBindsTo =containerd.serviceAfter =network-on line.target firewalld.service containerd.serviceWants =network-on line.targetRequires =docker.socket[Service] Type =notifyEnvironmentFile =-/var/run/flannel/dockerEnvironmentFile =-/var/run/flannel/subnet.envExecStart =/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock \ $DOCKER_NETWORK_OPTIONS \ --exec-opt native.cgroupdriver =systemd \ --registry-mirror =https://1 u1w02a0.mirror.aliyuncs.com ExecReload =/bin/kill -s HUP $MAINPID TimeoutSec =0 RestartSec =2 Restart =alwaysStartLimitBurst =3 StartLimitInterval =60 sLimitNOFILE =infinityLimitNPROC =infinityLimitCORE =infinityTasksMax =infinityDelegate =yes KillMode =process[Install] WantedBy =multi-user.target
复制 systemd 文件到 worker 节点
1 2 scp /usr/lib/systemd/system/docker.service worker-01:/usr/lib/systemd/system/ scp /usr/lib/systemd/system/docker.service worker-02:/usr/lib/systemd/system/
启动 docker 服务
1 2 3 systemctl daemon-reload systemctl enable docker systemctl start docker
以 master-01 为例,检查 docker 生成的网桥地址
1 2 3 4 5 6 ip addr show docker0 # 输出如下 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:d8:f9:f0:6b brd ff:ff:ff:ff:ff:ff inet 10.99.41.1/24 brd 10.99.41.255 scope global docker0 valid_lft forever preferred_lft forever
可以看到 10.99.41.255 与 flannel 的子网一致,到这里 docker 与 flannel 就配置完了,docker 应用上了 flannel 生成的网络信息
五、安装 master 节点
操作将在 master-01 中执行
在 kubernetes 中, master 节点包含组件:
kube-apiserver
kube-scheduler
kube-controller-manager
除apiserver外都可以集群模式运行,并且通过选举产生一个工作进程,其他进程阻塞
这里的环境 master 节点有计划安装 kubelet 这些 worker 节点的进程,参考配置worker 节点部分说明
1. 准备 解压准备好的二进制包,复制到 /usr/local/kubernetes/bin/
下
1 2 3 4 5 6 7 8 9 10 11 12 cd /repo tar xf kubernetes-server-linux-amd64.tar.gz cp kubernetes/server/bin/kubectl /usr/local/kubernetes/bin/ cp kubernetes/server/bin/kube-apiserver /usr/local/kubernetes/bin/ cp kubernetes/server/bin/kube-scheduler /usr/local/kubernetes/bin/ cp kubernetes/server/bin/kube-controller-manager /usr/local/kubernetes/bin/ cp kubernetes/server/bin/kubelet /usr/local/kubernetes/bin/ cp kubernetes/server/bin/kube-proxy /usr/local/kubernetes/bin/ # 复制证书文件 cp ~/ssl/k8s/*.pem /etc/kubernetes/ssl/
2. apiserver 创建 TLS Bootstrap Token ,用于 Token 认证
1 2 3 4 echo "$(head -c 16 /dev/urandom | od -An -t x | tr -d ' '),kubelet-bootstrap,10001,system:kubelet-bootstrap" > /etc/kubernetes/token.csv # 文件内容类似 47ec1167391bc238ccdd44367465eb08,kubelet-bootstrap,10001,system:kubelet-bootstrap
创建配置文件 /etc/kubernetes/kube-apiserver
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 KUBE_APISERVER_OPTS ="--logtostderr=true \ --v=4 \ --etcd-servers=https://172.16.10.20:2379,https://172.16.10.25:2379,https://172.16.10.26:2379 \ --bind-address=172.16.10.20 \ --secure-port=6443 \ --advertise-address=172.16.10.20 \ --allow-privileged=true \ --service-cluster-ip-range=10.66.0.0/24 \ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota,NodeRestriction \ --authorization-mode=RBAC,Node \ --enable-bootstrap-token-auth \ --token-auth-file=/etc/kubernetes/token.csv \ --service-node-port-range=30000-50000 \ --tls-cert-file=/etc/kubernetes/ssl/server.pem \ --tls-private-key-file=/etc/kubernetes/ssl/server-key.pem \ --client-ca-file=/etc/kubernetes/ssl/ca.pem \ --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \ --etcd-cafile=/etc/etcd/ssl/ca.pem \ --etcd-certfile=/etc/etcd/ssl/server.pem \ --etcd-keyfile=/etc/etcd/ssl/server-key.pem \ --kubelet-client-certificate=/etc/kubernetes/ssl/server.pem \ --kubelet-client-key=/etc/kubernetes/ssl/server-key.pem"
创建 systemd 文件 /usr/lib/systemd/system/kube-apiserver.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [Unit] Description =Kubernetes API ServiceDocumentation =https://github.com/kubernetes/kubernetesAfter =network.targetAfter =etcd.service[Service] EnvironmentFile =-/etc/kubernetes/kube-apiserverExecStart =/usr/local/kubernetes/bin/kube-apiserver $KUBE_APISERVER_OPTS Restart =on -failureType =notifyLimitNOFILE =65536 [Install] WantedBy =multi-user.target
启动 apiserver ,注意看下状态
1 2 3 4 systemctl daemon-reload systemctl enable kube-apiserver systemctl start kube-apiserver systemctl status kube-apiserver
3. scheduler 创建配置文件 /etc/kubernetes/kube-scheduler
1 KUBE_SCHEDULER_OPTS ="--logtostderr=true --v=4 --master=127.0.0.1:8080 --leader-elect"
创建 systemd 文件 /usr/lib/systemd/system/kube-scheduler.service
1 2 3 4 5 6 7 8 9 10 11 12 13 [Unit] Description =Kubernetes Scheduler PluginDocumentation =https://github.com/kubernetes/kubernetes[Service] EnvironmentFile =-/etc/kubernetes/configEnvironmentFile =-/etc/kubernetes/schedulerExecStart =/usr/local/kubernetes/bin/kube-scheduler $KUBE_SCHEDULER_OPTS Restart =on -failureLimitNOFILE =65536 [Install] WantedBy =multi-user.target
启动 scheduler 服务
1 2 3 4 systemctl daemon-reload systemctl start kube-scheduler systemctl enable kube-scheduler systemctl status kube-scheduler
4. controller-manager 创建配置文件 /etc/kubernetes/kube-controller-manager
1 2 3 4 5 6 7 8 9 10 11 KUBE_CONTROLLER_MANAGER_OPTS ="--logtostderr=true \ --v=4 \ --master=127.0.0.1:8080 \ --leader-elect=true \ --address=127.0.0.1 \ --service-cluster-ip-range=10.66.0.0/24 \ --cluster-name=kubernetes \ --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \ --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --root-ca-file=/etc/kubernetes/ssl/ca.pem \ --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem"
创建 systemd 文件 /usr/lib/systemd/system/kube-controller-manager.service
1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description =Kubernetes Controller ManagerDocumentation =https://github.com/kubernetes/kubernetes[Service] EnvironmentFile =-/etc/kubernetes/kube-controller-managerExecStart =/usr/local/kubernetes/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_OPTS Restart =on -failureLimitNOFILE =65536 [Install] WantedBy =multi-user.target
启动服务
1 2 3 4 systemctl daemon-reload systemctl start kube-controller-manager systemctl enable kube-controller-manager systemctl status kube-controller-manager
5. 检查 在 master 中用 kubectl 命令查看集群信息
1 2 3 4 # kubectl cluster-info Kubernetes master is running at http://localhost:8080 To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
查看集群的服务状态
1 2 3 4 5 6 7 # kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"}
6. 其他
六、安装 worker 节点
在 worker-01 和 worker-02 中安装,二进制包资源挂载了 master-01 的共享在 /repo 目录
master 节点也可以安装
worker 节点运行 kubernetes 中的两个组件:
另外 worker 节点是集群工作的主力,所以之前也安装有 docker ,并且搭配了 flannel,现在准备开始安装上面两个未安装的组件
两个 worker 节点先复制组件的二进制文件到 /usr/local/kubernetes/bin/
1 cp -a /repo/kubernetes/server/bin/{kubelet,kube-proxy,kubectl} /usr/local/kubernetes/bin/
1. 创建 kubelet bootstrap 文件 在 master 中执行,用到之前生成的 token ,再看一下它
1 2 cat /etc/kubernetes/token.csv 47ec1167391bc238ccdd44367465eb08,kubelet-bootstrap,10001,system:kubelet-bootstrap
下面命令有个 token 变量是上面文件的 token 部分
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 # token.csv BOOTSTRAP_TOKEN=47ec1167391bc238ccdd44367465eb08 KUBE_APISERVER="https://172.16.10.20:6443" BOOTSTRAP_TOKEN=c2ea7340a258997471f1dba5c0cef395 KUBE_APISERVER="https://172.16.10.20:6443" cd /etc/kubernetes/ssl # 设置集群参数 kubectl config set-cluster kubernetes --certificate-authority=./ca.pem \ --embed-certs=true --server=${KUBE_APISERVER} \ --kubeconfig=bootstrap.kubeconfig # 设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=bootstrap.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=bootstrap.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=bootstrap.kubeconfig # 创建kube-proxy kubeconfig文件 kubectl config set-cluster kubernetes \ --certificate-authority=./ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kube-proxy.kubeconfig kubectl config set-credentials kube-proxy \ --client-certificate=./kube-proxy.pem \ --client-key=./kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=kube-proxy.kubeconfig kubectl config set-context default \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=kube-proxy.kubeconfig kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
将 kubelet-bootstrap 用户绑定到系统集群角色
1 2 3 kubectl create clusterrolebinding kubelet-bootstrap \ --clusterrole=system:node-bootstrapper \ --user=kubelet-bootstrap
将 master-01 中的 /etc/kubernetes/ssl
复制到 worker 节点
1 2 scp -r /etc/kubernetes/ssl/ worker-01:/etc/kubernetes/ scp -r /etc/kubernetes/ssl/ worker-02:/etc/kubernetes/
2. 配置 kubelet
未说明将是在 worker 节点执行命令,两个 worker 节点都要改的喔,这里以 worker-01 为例
创建 kubelet 参数配置文件 /etc/kubernetes/kubelet.config
另外的机器修改 address
为自己 IP,其他一样
1 2 3 4 5 6 7 8 9 10 11 12 13 # YAML kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 172.16.10.25 port: 10250 readOnlyPort: 10255 cgroupDriver: cgroupfs clusterDNS: ["10.66.0.2"] clusterDomain: cluster.local. failSwapOn: false authentication: anonymous: enabled: true
创建配置文件 /etc/kubernetes/kubelet
,hostname-override
修改成本机 IP
1 2 3 4 5 6 7 8 9 KUBELET_OPTS ="--logtostderr=true \ --v=4 \ --hostname-override=172.16.10.25 \ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \ --config=/etc/kubernetes/kubelet.config \ --cert-dir=/etc/kubernetes/ssl \ --pod-infra-container-image=k8s.gcr.io/pause-amd64:3.1 \ --client-ca-file=/etc/kubernetes/ssl/ca.pem"
/etc/kubernetes/kubelet.kubeconfig
文件开始是不存在的,等后面 master 通过 CSR 后,会产生它和 ssl 目录下的一些证书
k8s.gcr.io/pause-amd64:3.1
是 Pod 的基础镜像,在谷歌的 gcr 仓库,在 kubelet 启动的时候会从镜像仓库拉取,正常应该无法拉取到。这里同步了一份,可以去这里下载 k8s.gcr.io_pause-amd64_3.1.tgz ,下载后先 docker load -i k8s.gcr.io_pause-amd64_3.1.tgz
导入节点的 docker 中,再 启动 kubelet 。
--pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0
阿里云的镜像
创建 systemd 文件 /usr/lib/systemd/system/kubelet.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description =Kubernetes Kubelet ServerDocumentation =https://github.com/kubernetes/kubernetesAfter =docker.serviceRequires =docker.service[Service] EnvironmentFile =/etc/kubernetes/kubeletExecStart =/usr/local/kubernetes/bin/kubelet $KUBELET_OPTS Restart =on -failureKillMode =process[Install] WantedBy =multi-user.target
启动服务
1 2 3 4 systemctl daemon-reload systemctl enable kubelet systemctl start kubelet systemctl status kubelet
3. 通过 kubelet CSR 请求 如果无异常,现在 kubelet 已经运行起来了,kubelet 通过 /etc/kubernetes/ssl/bootstrap.kubeconfig
知道怎么去连集群 master 上的 apiserver ,这时候它就会请求加入集群
查看并批准 kubelet 的 CSR 请求
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # kubectl get csr NAME AGE REQUESTOR CONDITION node-csr-73s9OQf5HNcjoD7RPIJsrwXsrx4VPC89lKZdhK5i_Mk 11m kubelet-bootstrap Pending node-csr-U5TpUTOu5nR_L-8Ooccbv5hj4LKEeLsMzpFkbvC8UII 5m41s kubelet-bootstrap Pending # 批准对应NAME的CSR # kubectl certificate approve node-csr-73s9OQf5HNcjoD7RPIJsrwXsrx4VPC89lKZdhK5i_Mk certificatesigningrequest.certificates.k8s.io/node-csr-73s9OQf5HNcjoD7RPIJsrwXsrx4VPC89lKZdhK5i_Mk approved # kubectl certificate approve node-csr-U5TpUTOu5nR_L-8Ooccbv5hj4LKEeLsMzpFkbvC8UII certificatesigningrequest.certificates.k8s.io/node-csr-U5TpUTOu5nR_L-8Ooccbv5hj4LKEeLsMzpFkbvC8UII approved # 再次查看CSR已经通过了 # kubectl get csr NAME AGE REQUESTOR CONDITION node-csr-73s9OQf5HNcjoD7RPIJsrwXsrx4VPC89lKZdhK5i_Mk 13m kubelet-bootstrap Approved,Issued node-csr-U5TpUTOu5nR_L-8Ooccbv5hj4LKEeLsMzpFkbvC8UII 7m31s kubelet-bootstrap Approved,Issued
查看集群中的节点,发现无法看出哪个是 master 哪个是 worker 节点,ROLES
都是 none
(据说是手动安装的问题,kubeadm会有 master 的角色标记)
其实 ROLES
也是一个 LABEL 不过是一个特殊的 LABEL,看下如何添加
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # kubectl get nodes NAME STATUS ROLES AGE VERSION 172.16.10.20 Ready <none> 58m v1.15.3 172.16.10.25 Ready <none> 62m v1.15.3 172.16.10.26 Ready <none> 62m v1.15.3 # 尝试将 worker-01 的 roles 改成 worker # kubectl label node 172.16.10.25 node-role.kubernetes.io/worker=worker node/172.16.10.25 labeled # kubectl get nodes NAME STATUS ROLES AGE VERSION 172.16.10.20 Ready <none> 63m v1.15.3 172.16.10.25 Ready worker 67m v1.15.3 172.16.10.26 Ready <none> 67m v1.15.3 # 另外的节点也改一下 # kubectl label node 172.16.10.26 node-role.kubernetes.io/worker=worker # kubectl label node 172.16.10.20 node-role.kubernetes.io/master=master # kubectl get nodes NAME STATUS ROLES AGE VERSION 172.16.10.20 Ready master 65m v1.15.3 172.16.10.25 Ready worker 69m v1.15.3 172.16.10.26 Ready worker 69m v1.15.3 # 如果配置错了,比如把worker-02 26这个节点误操作标记成了 master,可以如下操作清除,再重新标记 # kubectl label node 172.16.10.26 node-role.kubernetes.io/master-
4. 配置 kube-proxy
worker 节点安装
创建配置文件 /etc/kubernetes/kube-proxy
1 2 3 4 5 KUBE_PROXY_OPTS ="--logtostderr=true \ --v=4 \ --hostname-override=172.16.10.25 \ --cluster-cidr=10.66.0.0/24 \ --kubeconfig=/etc/kubernetes/ssl/kube-proxy.kubeconfig"
创建systemd 文件 /usr/lib/systemd/system/kube-proxy.service
1 2 3 4 5 6 7 8 9 10 11 12 13 [Unit] Description =Kubernetes Kube-Proxy ServerDocumentation =https://github.com/GoogleCloudPlatform/kubernetesAfter =network.target[Service] EnvironmentFile =-/etc/kubernetes/kube-proxyExecStart =/usr/local/kubernetes/bin/kube-proxy $KUBE_PROXY_OPTS Restart =on -failureLimitNOFILE =65536 [Install] WantedBy =multi-user.target
启动服务
1 2 3 4 systemctl daemon-reload systemctl enable kube-proxy systemctl start kube-proxy systemctl status kube-proxy
七、测试集群
一般主节点的 kubectl 会配置可以连接集群,下面在执行 kubectl 命令时,默认是 master 节点
1. 创建 kubectl kubeconfig 文件
这个没发现具体啥用处
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 export KUBE_APISERVER="https://172.16.10.20:6443" # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} # 设置客户端认证参数 kubectl config set-credentials admin \ --client-certificate=/etc/kubernetes/ssl/admin.pem \ --embed-certs=true \ --client-key=/etc/kubernetes/ssl/admin-key.pem # 设置上下文参数 kubectl config set-context kubernetes \ --cluster=kubernetes \ --user=admin # 设置默认上下文 kubectl config use-context kubernetes
2. 创建资源 测试 kubernetes 的方式可以是创建一个资源观察一下,在 master-01 中操作,创建一个运行 nginx 的 deployment
资源,worker 节点会去拉取 nginx 镜像运行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # kubectl run nginx --replicas=2 --labels="run=load-balancer-example" --image=nginx --port=80 kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deployment.apps/nginx created # kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE nginx 0/2 2 0 2m14s # kubectl get pods NAME READY STATUS RESTARTS AGE nginx-5c47ff5dd6-98qbn 0/1 ContainerCreating 0 2m56s nginx-5c47ff5dd6-lgjjv 0/1 ContainerCreating 0 2m56s # 已经运行在集群中了 # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-5c47ff5dd6-98qbn 1/1 Running 0 5m37s 10.99.41.2 172.16.10.20 <none> <none> nginx-5c47ff5dd6-lgjjv 1/1 Running 0 5m37s 10.99.76.2 172.16.10.26 <none> <none>
3. 安装 dashboard
dashboard 项目地址
在 master-01 中安装
kubernetes 的 addons 中包含一个 dashboard ,它是一个 web 页面,为集群提供了一些基础的资源可视化、状态查看等功能,现在,在集群中安装它。
3.1 创建资源 在项目中,有说明安装方式,提供了一个 yaml 文件,在集群中用 kubectl apply
就算是安装完了。需要注意的是,在这个资源文件里面 dashboard 镜像,指定的是位于 gcr 中的镜像,如果访问 google 的镜像站有问题,可以把这个文件中的 dashboard 镜像地址改一下,国内的镜像站有转存,比如阿里云就有
先看安装的方式
1 2 # 如果网络正常,可按官网的来,一步到位 # kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
如果访问google镜像站网络不好,我们改用国内镜像站后创建
1 2 3 4 5 6 7 8 9 mkdir ~/k8s/dashboard cd ~/k8s/dashboard # 修改 k8s.gcr.io 为阿里云中 google 容器地址 curl https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml > dashboard.yaml sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#gp" ./dashboard.yaml kubectl apply -f dashboard.yaml
创建后查看一下,该资源创建的时候指定了 namespace ,我们通过它指定的 kube-system 命名空间可以看到资源情况
1 2 3 kubectl get deployments -n kube-system kubectl get service -n kube-system kubectl get pods -n kube-system
3.2 创建账号和集群关系绑定 为 dashboard 创建账号,和角色绑定关系,我们需要 账号的 token 来登陆 dashboard 页面
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # 创建service account # kubectl create sa dashboard-admin -n kube-system # kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin # 查看token,先找到资源文件中定义生成的 secrets 名称 # kubectl get secrets -n kube-system |grep "dashboard-admin" |awk '{print $1}' dashboard-admin-token-686fq # 我这里的名称 # 通过名称获得 token,最长那一串字符 # kubectl describe secrets -n kube-system dashboard-admin-token-686fq Name: dashboard-admin-token-686fq Namespace: kube-system Labels: <none> Annotations: kubernetes.io/service-account.name: dashboard-admin kubernetes.io/service-account.uid: a86541c1-5f1c-4b07-b84b-f9504727c203 Type: kubernetes.io/service-account-token Data ==== ca.crt: 1371 bytes namespace: 11 bytes token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tNjg2ZnEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYTg2NTQxYzEtNWYxYy00YjA3LWI4NGItZjk1MDQ3MjdjMjAzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.JtNwe-9Gfqo9-gS1vvf8AMLzXq8y6EPuxPSIO1uGZulJMQ3soFCwCji-HILhJ8L8hwbx4_sCoThcCDIMuiWgRxcFh_4zlcxnnjEfquYinKnVVCw_jovth2EIt9CXhmV_DLjOJcNaLXzCRDvi3usLA_QjT3uTLhoyTpLKpgxNL1XsMeE12ZJIe4iOpvvS-IQ_w89fqH6zhnfsVYQS1lYabNGkpKxMLyGFY9c76NUhbZxjEYP_jan2yawLXdJnJvOrS-HCRQaU01kikZ9wk38FRzrDU4Ya1O0Vw-tMhF91_v-uJI-XC-VVRw4yG6iZWokmVi28vtcEo-srBDmOp0NPGw
3.4 创建和安装访问证书 创建证书,让浏览器登陆,最后生成 p12 文件时需要输入并重复一次密码,密码需要记住。生成后下载 p12 证书到本机,安装它,密码是刚刚设置的
1 2 3 grep 'client-certificate-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d >> kubecfg.crt grep 'client-key-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d >> kubecfg.key openssl pkcs12 -export -clcerts -inkey kubecfg.key -in kubecfg.crt -out kubecfg.p12 -name "kubernetes-web-client"
最后访问 https://172.16.10.20:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
打开(把 172.16.10.20 替换成自己的 master ip)
打开浏览器时,将提示是否使用证书,我们点确认后,提示选择令牌 还是Kubeconfig 来登陆,我们选择令牌 ,在输入框中输入上面 kubectl describe secrets -n kube-system dashboard-admin-token-686fq
输出的 token 字符串,即可登陆
在页面中可以看到所有命名空间中的资源,并且可以对容器执行命令
八、问题记录
执行exec时报错如下,按官网文档 的说法是需要给 APIServer 和 kubelet 指定认证的 key 再启动,上面配置文件已经改过这个问题,另外可以参考这个地址
1 unable to upgrade connection: forbidden (user-system:anonymous, verb=create, re...
执行命令报错如下,需要创建 RBAC 规则(在 dashboard 中的容器组执行命令中碰到这个报错)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 # 报错 unable to upgrade connection: forbidden (user=kubernetes, verb=create, r... # 措施,创建规则文件 cat > apiserver-to-kubelet.yaml <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kubernetes-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kubernetes namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kubernetes-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kubernetes EOF # 创建规则 kubectl create -f apiserver-to-kubelet.yaml