k8s 测试环境问题记录
chenzuoqing Lv3

k8s测试环境问题记录

在公司准备测试环境,记录一些过程中碰到的问题

dashboard认证

dashboard 大概就发现一个比较明显的好处,用来执行容器命令比较方便 : )

安装 dashboard 先去 github获取 kubernetes-dashboard.yaml

最后 Service 部分添加 NodePort 映射。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# ...省略前面内容...
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
ports:
- port: 443
targetPort: 8443
nodePort: 9000
type: NodePort
selector:
k8s-app: kubernetes-dashboard

打开 kube-proxy 节点的9000端口,提示认证使用 token,通过下面的命令获取。复制整个长串的 token 到页面登陆 dashboard。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# kubectl -n kube-system get secret 
NAME TYPE DATA AGE
an00-token-brpds kubernetes.io/service-account-token 3 14d
default-token-jct5w kubernetes.io/service-account-token 3 22d
elasticsearch-logging-token-zwr2b kubernetes.io/service-account-token 3 14d
fluentd-es-token-rxrxc kubernetes.io/service-account-token 3 44m
heapster-token-qk4m9 kubernetes.io/service-account-token 3 14d
kube-dns-autoscaler-token-nmgjs kubernetes.io/service-account-token 3 16d
kube-dns-token-s5hkb kubernetes.io/service-account-token 3 16d
kubernetes-dashboard-certs Opaque 0 14d
kubernetes-dashboard-key-holder Opaque 2 21d
kubernetes-dashboard-token-7nt98 kubernetes.io/service-account-token 3 14d

# 找token关键字的secret,获取到最后的token字段内容粘贴到页面登陆。
# kubectl -n kube-system describe secret kubernetes-dashboard-token-7nt98
Name: kubernetes-dashboard-token-7nt98
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name=kubernetes-dashboard
kubernetes.io/service-account.uid=54c71522-6f80-11e8-bc0b-525400eac085

Type: kubernetes.io/service-account-token

Data
====
ca.crt: 1107 bytes
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi03bnQ5OCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjU0YzcxNTIyLTZmODAtMTFlOC1iYzBiLTUyNTQwMGVhYzA4NSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.leD2gC5FkkN1_0mt5_AwveStC6vh5H8-UL1LqwF7N07xQ2ZKSh1matYyWyv-buMflrks1-my88MKwaYNmMaNRk2-WrlybNLJKrf-QLpmGLdCB3IHBuSViuHHQwPS4g7CD5GNAsuPZF3GAszuBamBD3HJT1okrrH8J3KlstqMpYsEbwullLfgQaznfd02YjrR6izC3sneJpj0vTKSrY8LxweI2xcYVNshZHRacEgdNzwBTe48dU_9pCqyUWOSS2J2Y4EimAMyPQlwDbazgGuHn027neIosxO0ooSbEeiqaEnu9-ATpyJCCWW4ukOxt_PG8VJsNzmZuG18LIA_KImd6A

kube-dns

为k8s中的pod增加service名字解析和自动发现,通过api监控service变动,可以将service名字解析到对应的VIP中。

kube-dns包含三个镜像(谷歌gcr的仓库,若服务器环境没外网需要先准备)

  • kube-dns:通过k8s api监控service和ip变动以及它们的对应关系,保存到内存中后就是dns记录了。
  • dnsmasq-nanny:通过kube-dns容器获取dns规则,在集群中相当于dns服务器,减轻kube-dns压力,提高稳定性和查询性能
  • sidecar:

使用官方的YAML文件创建,在源码的”kubernetes/cluster/addons/“目录下,关于kube-dns主要有两个文件:

dns服务定义文件,包含Service、ServiceAccount、Deployment等。需要将文件中的 __PILLAR__DNS__SERVER__ 修改成一个clusterip,将 __PILLAR__DNS__DOMAIN__ 修改成 cluster.local(注意保留后面的点)。

  • dns/kube-dns.yaml.base

dns在编排的服务通信和发现中有很大的作用,所以它不能有闪失。官方提供了一个为kube-dns自动scale的配置,文件可以直接创建

  • dns-horizontal-autoscaler/dns-horizontal-autoscaler.yaml

在创建了kube-dns服务后,需要修改pod默认的dns服务器配置。pod被kubelet创建,找到kubelet的配置文件,添加dns配置内容,注意 kubelet.service 启动文件也添加新增的配置选项名,刷新配置后重启 kubelet

1
2
3
4
5
6
7
8
9
10
11
12
13
# cat /etc/kubernetes/kubelet.conf 
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
NODE_ADDRESS="--address=0.0.0.0"
NODE_HOSTNAME="--hostname-override=debian-70"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_POD_INFRA_CONTAINER_IMAGE="--pod-infra-container-image=k8s.gcr.io/pause-amd64:3.1"
KUBE_RUNTIME_CGROUPS="--runtime-cgroups=/systemd/system.slice"
KUBE_CGROUPS="--kubelet-cgroups=/systemd/system.slice"
KUBE_FAIL_SWAP_ON="--fail-swap-on=false"
KUBE_CONFIG="--kubeconfig=/etc/kubernetes/kubeconfig.yml"
KUBELET_DNS_IP="--cluster-dns=10.66.77.2" # 修改成__PILLAR__DNS__SERVER__替换的clusterip
KUBELET_DNS_DOMAIN="--cluster-domain=cluster.local" # 修改成__PILLAR__DNS__DOMAIN__的内容

可以创建简单的busybox应用,查看它的 /etc/resolv.conf 文件,默认有 kubernetes.default 指向apiserver。解析得到的是service的clusterIP,不能被ping通

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# kubectl exec -it busybox sh
/ # nslookup kubernetes.default
Server: 10.66.77.2
Address 1: 10.66.77.2 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default
Address 1: 10.66.77.1 kubernetes.default.svc.cluster.local
/ #
/ #
/ # nslookup nginx-service
Server: 10.66.77.2
Address 1: 10.66.77.2 kube-dns.kube-system.svc.cluster.local

Name: nginx-service
Address 1: 10.66.77.225 nginx-service.default.svc.cluster.local
/ # exit

# 当前的service,和上面解析的ip是一致的
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.66.77.1 <none> 443/TCP 6d
nginx-service ClusterIP 10.66.77.225 <none> 80/TCP 5h

elasticsearch fluentd kibana

官方提供了日志监控方案,资源定义文件在源码的 kubernetes/cluster/addons/fluentd-elasticsearch 目录,获取YAML文件可以创建和运行EFK。

依赖镜像:( elasticsearch 和 kibana 非常大…)

  • k8s.gcr.io/elasticsearch:v5.6.4
  • alpine:3.6
  • k8s.gcr.io/fluentd-elasticsearch:v2.0.4
  • docker.elastic.co/kibana/kibana:5.6.4

elasticsearch 是一个搜索引擎和日志存储的数据库
flunetd 会将节点中保存的其他 pod 输出的日志流收集到 elasticsearch 中
kibana 展示日志,提供人性化搜索界面和图表等功能,需要访问可以改 kibana-service.yaml 映射 NodePort

从源码中获取定义资源的 YAML 文件,创建前注释一下 kibana-deployment.yaml 中 env 的 SERVER_BASEPATH 变量和 value 保存,开始创建。
kibana 启动有点久,如果 pod 日志没异常的情况,需要耐心等几分钟。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ls -l
total 36
-rw-r--r-- 1 root root 382 Jun 13 11:34 es-service.yaml
-rw-r--r-- 1 root root 2820 Jun 13 11:34 es-statefulset.yaml
-rw-r--r-- 1 root root 15648 Jun 13 11:34 fluentd-es-configmap.yaml
-rw-r--r-- 1 root root 2774 Jun 13 11:34 fluentd-es-ds.yaml
-rw-r--r-- 1 root root 1186 Jun 13 11:34 kibana-deployment.yaml
-rw-r--r-- 1 root root 354 Jun 13 11:34 kibana-service.yaml

# kubectl create -f .

# 查看部分资源情况
# kubectl get statefulset,pod,daemonset -n kube-system
NAME DESIRED CURRENT AGE
statefulset.apps/elasticsearch-logging 2 2 11m

NAME READY STATUS RESTARTS AGE
pod/elasticsearch-logging-0 1/1 Running 0 6m
pod/elasticsearch-logging-1 1/1 Unknown 0 6m
pod/kibana-logging-bc776986-7vtf7 1/1 Unknown 0 11m
pod/kibana-logging-bc776986-cm69s 0/1 ContainerCreating 0 34s
pod/kube-dns-659bc9899c-ghj2n 0/3 ContainerCreating 0 20s
pod/kube-dns-659bc9899c-lm4pd 3/3 Running 0 1d
pod/kube-dns-659bc9899c-r655f 3/3 Unknown 0 1d
pod/kube-dns-autoscaler-79b4b844b9-6v856 1/1 Running 0 1d
pod/kubernetes-dashboard-5c469b58b8-pf7cg 0/1 ContainerCreating 0 16s
pod/kubernetes-dashboard-5c469b58b8-pkttx 1/1 Unknown 2 5d

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.extensions/fluentd-es-v2.0.4 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 11m

1. elasticsearch statefulset启动和删除可能的错误

若是启动 elasticsearch-logging 碰到如下错误,请为 apiserver 和所有 kubelet 添加启动参数 --allow-privileged (默认为 flase ),重载配置文件后重新启动

1
2
3
4
5
6
7
8
9
10
11
12
13
# kubectl describe statefulset -n kube-system 
...
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 1m (x24 over 6m) statefulset-controller create Pod elasticsearch-logging-0 in StatefulSet elasticsearch-logging failed error: Pod "elasticsearch-logging-0" is invalid: spec.initContainers[0].securityContext.privileged: Forbidden: disallowed by cluster policy

# 重启后启动成功
# kubectl describe statefulset -n kube-system
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 1m (x24 over 6m) statefulset-controller create Pod elasticsearch-logging-0 in StatefulSet elasticsearch-logging failed error: Pod "elasticsearch-logging-0" is invalid: spec.initContainers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
Normal SuccessfulCreate 49s statefulset-controller create Pod elasticsearch-logging-0 in StatefulSet elasticsearch-logging successful
Normal SuccessfulCreate 42s statefulset-controller create Pod elasticsearch-logging-1 in StatefulSet elasticsearch-logging successful

另外在删除资源时我面临一个问题,其他的资源都删除了,但是 elasticsearch-logging 总是不能删除成功,即使是强制 --force ,提示我超时了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

# kubectl delete -f .

# 还存在elasticsearch-logging这个statefulset
# kubectl get -f .
NAME DESIRED CURRENT AGE
elasticsearch-logging 0 2 2h
Error from server (NotFound): services "elasticsearch-logging" not found
Error from server (NotFound): serviceaccounts "elasticsearch-logging" not found
Error from server (NotFound): clusterroles.rbac.authorization.k8s.io "elasticsearch-logging" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "elasticsearch-logging" not found
Error from server (NotFound): configmaps "fluentd-es-config-v0.1.4" not found
Error from server (NotFound): serviceaccounts "fluentd-es" not found
Error from server (NotFound): clusterroles.rbac.authorization.k8s.io "fluentd-es" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "fluentd-es" not found
Error from server (NotFound): daemonsets.apps "fluentd-es-v2.0.4" not found
Error from server (NotFound): deployments.apps "kibana-logging" not found
Error from server (NotFound): services "kibana-logging" not found

# kubectl delete statefulset elasticsearch-logging -n kube-system --force
timed out waiting for "elasticsearch-logging" to be synced

后面在google找到一个方法,删除失败的资源,指定 --cascade=false ,解决了我的问题。若删除后还可以看到被删除资源定义的pod,可以指定 --grace-period=0 删除它。

1
2
3
4
5
# kubectl delete -f es-statefulset.yaml --cascade=false

# 如果删除了定义elastic的stateful,但是get pod发现还有stateful定义的pod没删除,可以执行下面命令强制删除
# kubectl --namespace=kube-system delete pods elasticsearch-logging-0 --grace-period=0 --force
# kubectl --namespace=kube-system delete pods elasticsearch-logging-1 --grace-period=0 --force

2. fluentd daemonset启动问题

另外一个 fluentd 问题,get 发现 daemonset fluentd-es-v2.0.4 运行0个,这里需要为所有 node 添加一个label,如下

1
2
3
4
5
6
7
8
9
10
11
12
# kubectl get daemonset fluentd-es-v2.0.4 -n kube-system 
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
fluentd-es-v2.0.4 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 32s

# 为所有node添加一个label,有此label的node才会运行fluentd
# kubectl label node debian-70 beta.kubernetes.io/fluentd-ds-ready=true
# kubectl label node sl-80 beta.kubernetes.io/fluentd-ds-ready=true

# 再次get,已经有在运行了
# kubectl get daemonset fluentd-es-v2.0.4 -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
fluentd-es-v2.0.4 2 2 2 2 2 beta.kubernetes.io/fluentd-ds-ready=true 4m

一次偶然,公司测试机房断电了(我猜测与这个有关系),再启动集群的时候有一台节点的 fluentd 无法启动,如下体现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# pod状态一直是CrashLoopBackOff,describe查看到的信息也有限,重启也不能正常
# kubectl get -n kube-system pods
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 3 14d
elasticsearch-logging-1 1/1 Running 3 14d
fluentd-es-v2.0.4-4lthw 1/1 Running 0 16h
fluentd-es-v2.0.4-lvmj7 0/1 CrashLoopBackOff 4 16h
heapster-69b5d4974d-4dzm8 1/1 Running 3 14d
kibana-logging-799d8b46db-rn6fq 1/1 Running 3 14d
kube-dns-659bc9899c-ghj2n 3/3 Running 9 15d
kube-dns-659bc9899c-lm4pd 3/3 Running 12 16d
kube-dns-autoscaler-79b4b844b9-6v856 1/1 Running 4 16d
kubernetes-dashboard-7d5dcdb6d9-f4j6n 1/1 Running 3 14d
monitoring-grafana-69df66f668-gg9kl 1/1 Running 3 14d
monitoring-influxdb-78d4c6f5b6-2phht 1/1 Running 3 14d

# 另外看一下pod日志,这个帮助很大
# kubectl logs -n kube-system fluentd-es-v2.0.4-lvmj7
2018-06-29 01:21:19 +0000 [warn]: parameter 'time_format' in <source>
@id fluentd-containers.log
@type tail
path "/var/log/containers/*.log"
pos_file "/var/log/es-containers.log.pos"
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag "raw.kubernetes.*"
read_from_head true
<parse>
@type "multi_format"
<pattern>
format json
time_key "time"
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
time_type string
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
expression "^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$"
ignorecase false
multiline false
</pattern>
</parse>
</source> is not used. # 下面开始error,而另外正常的pod下面是传输到elastic的info日志了
2018-06-29 01:21:19 +0000 [error]: unexpected error error_class=TypeError error="no implicit conversion of Symbol into Integer"
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buffer/file_chunk.rb:219:in `[]'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buffer/file_chunk.rb:219:in `restore_metadata'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buffer/file_chunk.rb:322:in `load_existing_staged_chunk'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buffer/file_chunk.rb:51:in `initialize'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buf_file.rb:144:in `new'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buf_file.rb:144:in `block in resume'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buf_file.rb:133:in `glob'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buf_file.rb:133:in `resume'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buffer.rb:171:in `start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/buf_file.rb:120:in `start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/plugin/output.rb:415:in `start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:165:in `block in start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:154:in `block (2 levels) in lifecycle'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:153:in `each'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:153:in `block in lifecycle'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:140:in `each'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:140:in `lifecycle'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/root_agent.rb:164:in `start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/engine.rb:274:in `start'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/engine.rb:219:in `run'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/supervisor.rb:774:in `run_engine'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/supervisor.rb:523:in `block in run_worker'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/supervisor.rb:699:in `main_process'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/supervisor.rb:518:in `run_worker'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/lib/fluent/command/fluentd.rb:316:in `<top (required)>'
2018-06-29 01:21:19 +0000 [error]: /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2018-06-29 01:21:19 +0000 [error]: /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2018-06-29 01:21:19 +0000 [error]: /var/lib/gems/2.3.0/gems/fluentd-1.1.0/bin/fluentd:8:in `<top (required)>'
2018-06-29 01:21:19 +0000 [error]: /usr/local/bin/fluentd:22:in `load'
2018-06-29 01:21:19 +0000 [error]: /usr/local/bin/fluentd:22:in `<main>'
2018-06-29 01:21:19 +0000 [error]: unexpected error error_class=TypeError error="no implicit conversion of Symbol into Integer"
2018-06-29 01:21:19 +0000 [error]: suppressed same stacktrace

在 google 找到一个 GitHub 的 fluentd/issues/#1760,发现和缓存的元数据损坏有关系。不同的是这里 2.0.4 版本 buffer 存放位置变了,映射在宿主机的 /var/log/fluentd-buffers/kubernetes.system.buffer 目录。我将 *.meta 删除后节点的 fluentd 启动正常(可以删除 pod 会自动创建,或者等它重启)。

1
2
3
4
5
6
7
8
9
10
# cd /var/log/fluentd-buffers/kubernetes.system.buffer/

# ls
buffer.b56efd1a03768b2f7eabddf200cf50b79.log buffer.b56efd1a03ad20d927856cabfb9e0b1d7.log.meta
buffer.b56efd1a03768b2f7eabddf200cf50b79.log.meta buffer.b56efd1a1f72114c77ad018dec6591873.log
buffer.b56efd1a03ad20d927856cabfb9e0b1d7.log buffer.b56efd1a1f72114c77ad018dec6591873.log.meta

# rm -rf *.meta


关于监控heapster influxdb grafana

官方提供了监控方案heapster

获取 YAML 文件在项目的 deploy/kube-config/influxdb 路径下,获取后直接可以创建。镜像请先获取,过滤文件的image字段,拉取这些再创建比较妥。

启动了这些资源后,若 heapster 日志无异常,dashboard 过一会就有关于容器组 CPU 内存等监控状态。可以修改 grafana 的 service 映射成 NodePort,配置图形展示(admin/admin)。

关于自动水平扩展HPA

参考官方 https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

 Comments