Sometimes people don't need advice, they just need someone to listen and care.
Toggle navigation
Home
Archives
Tags
About
Kubernetes工作问题总结
2019-04-15 02:35:26
822
0
0
william
1. Node节点网络故障,导致这个节点的Pod健康检查正常,但通过边缘节点无法访问到这个节点上的Pod。 可能原因: systemd重启策略目前配置的是on-failure,如果flannel正常退出则不会重启flannel,需配置为always。 2. 部署上线过程中流量负载均衡异常,会出现丢失请求的情况: 由于Pod和endpoints同时删除,导致流量仍然可以打到terminating的Pod上,从而导致请求报错。 https://github.com/kubernetes/kubernetes/issues/47597 https://github.com/kubernetes/kubernetes/issues/43576 3. 1.9之前apiserver挂掉之后kubernetes endpoints不更新,导致部分访问失败。 1.9之前设置 --endpoint-reconciler-type string Default: "lease" Use an endpoint reconciler (master-count, lease, none) 4. kubelet 状态为notReady,报错信息:PLEG is not healthy 一般是由于Docker未响应 https://github.com/kubernetes/kubernetes/issues/45419 5. no space left on device ```bash kubelet.ns-k8s-node001.root.log.ERROR.20180214-113740.15702:1593018:E0320 04:59:09.572336 15702 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sa ndbox container for pod "osp-xxx-com-ljqm19-54bf7678b8-bvz9s": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/podf1bd9e87-1ef2-11e8-afd3-fa163ecf2dce/8710c146b3c8b52f5da62e222273703b1e3d54a6a6270a0ea7ce1b194f1b5053: no space left on device\"" ``` 参考[http://www.linuxfly.com/kubernetes-19-conflict-with-centos7/#entrymore](http://www.linuxfly.com/kubernetes-19-conflict-with-centos7/#entrymore) 解决方案:重新编译 `kubelet` ,修改 `vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go ` ``` golang func (s *MemoryGroup) Apply(d *cgroupData) (err error) { ... - `if d.config.KernelMemory != 0 {` // 删除了这行的判断,使得 1.9 默认就 enable cgroup kernel memory 特性 + // Only enable kernel memory accouting when this cgroup + // is created by libcontainer, otherwise we might get + // error when people use `cgroupsPath` to join an existed + // cgroup whose kernel memory is not initialized. if err := EnableKernelMemoryAccounting(path); err != nil { return err } ``` 添加 if d.config.KernelMemory != 0 这一行判断,重新编译 `kubele` 临时解决问题 。或者通过升级内核的方式解决该问题。 Centos7 下的ulimit在Docker中的坑 http://www.dockone.io/article/522 僵尸容器:Docker 中的孤儿进程 https://yq.aliyun.com/articles/61894 安全性问题:Docker Demon开启2376端口引起容器安全问题。解决方案:开启iptables,同时关掉nf_conntrack连接跟踪,添加ip和端口过滤规则 consul网络波动:Consul通过udp协议跨数据中心广播,设置-advertise IP不在可达网段,导致整个Consul集群down掉。解决方案:设置Consul -advertisIP为实际通信网卡IP Ceph VS GlusterFS 分小区,做隔离,降低风险 facebook/dvara devicemapper中thin-provisioning的discard功能引起的kernel crash 物理机上无法执行命令操作。“bash:fork:Cannot allocate memory” 容器内如果创建大量进程,并且不回收,会导致系统内核的pid_max达到上限。 内核中的pid_max(/proc/sys/kernel/pid_max)是全局共享的。 Process Number Controller:仅最新的4.3-rc1支持,pid-max per container。 容器内的内存值计算不准确,比实际低一个量级。 /cgroup/memory/docker/id/memory.usage_in_bytes
Pre:
Kubernetes的共享GPU集群调度
Next:
Centos 7 安装 Nvidia GPU 驱动及 CUDA
0
likes
822
Weibo
Wechat
Tencent Weibo
QQ Zone
RenRen
Please enable JavaScript to view the
comments powered by Disqus.
comments powered by
Disqus
Table of content