1. kubeadm 部署三节点(复用)高可用 k8s 集群
1.1 环境规划阶段
1.1.1 实验架构图
1.1.2 系统版本说明
OS 版本:CentOS Linux release 7.9.2009 (Core)
初始内核版本:3.10.0-1160.71.1.el7.x86_64
配置信息:2C2G 150G硬盘
文件系统:xfs
网络:外网权限
k8s 版本:1.25.9
1.1.3 环境基本信息
K8s集群角色 | IP地址 | 主机名 | 组件信息 |
---|---|---|---|
控制节点1(工作节点1) | 192.168.204.10 | k8s-001 | apiserver、controller-manager、schedule、etcd、kube-proxy、容器运行时、keepalived、nginx、calico、coredns、kubelet |
控制节点2(工作节点2) | 192.168.204.11 | k8s-002 | apiserver、controller-manager、schedule、etcd、kube-proxy、容器运行时、keepalived、nginx、calico、coredns、kubelet |
控制节点3(工作节点3) | 192.168.204.12 | k8s-003 | apiserver、controller-manager、schedule、etcd、kube-proxy、容器运行时、calico、coredns、kubelet |
VIP地址 | 192.168.204.13(k8s-vip) |
1.1.4 k8s 网段划分
-
service 网段:10.165.0.0/16
-
pod 网段:10.166.0.0/16
1.2 基础安装及优化阶段
无特别说明,三台都要执行
1.2.1 系统信息检查
检查系统版本以及内核
cat /etc/redhat-release ; uname -r
1.2.2 静态 IP 地址配置
服务器必须配置静态IP地址,不可变动
grep -E 'BOOTPROTO|IPADDR' /etc/sysconfig/network-scripts/ifcfg-ens32
1.2.3 配置主机名
按照规划配置对应主机的主机名即可
hostnamectl set-hostname k8s-001
hostnamectl set-hostname k8s-002
hostnamectl set-hostname k8s-003
1.2.4 配置/etc/hosts文件
cat >> /etc/hosts <<EOF
192.168.204.10 k8s-001
192.168.204.11 k8s-002
192.168.204.12 k8s-003
192.168.204.13 k8s-vip
EOF
1.2.5 关闭 selinux
setenforce 0
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
1.2.6 配置主机互信
### 只需要配置k8s-001到三节点互信即可
ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ''
ssh-copy-id k8s-001
ssh-copy-id k8s-002
ssh-copy-id k8s-003
1.2.7 关闭交换分区
必须关闭
swapoff -a
sed --in-place=.bak 's/.*swap.*/#&/g' /etc/fstab
1.2.8 关闭 firewalld
systemctl stop firewalld ; systemctl disable firewalld
1.2.9 关闭 NetworkManager
systemctl stop NetworkManager; systemctl disable NetworkManager
1.2.10 设置资源限制
/etc/security/limits.conf 初始文件没有任何有效的参数内容
cat >> /etc/security/limits.conf << EOF
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
EOF
1.2.11 配置时间同步
### 配置chrony.conf
cat > /etc/chrony.conf << EOF
server ntp.aliyun.com iburst
stratumweight 0
driftfile /var/lib/chrony/drift
rtcsync
makestep 10 3
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
keyfile /etc/chrony.keys
commandkey 1
generatecommandkey
logchange 0.5
logdir /var/log/chrony
EOF
### 重启服务
1.2.12 配置国内源
## centos 7 的yum和epel源
mkdir /etc/yum.repos.d.bak && mv /etc/yum.repos.d/*.repo /etc/yum.repos.d.bak
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
curl -o /etc/yum.repos.d/epel.repo https://mirrors.aliyun.com/repo/epel-7.repo
yum clean all && yum makecache
## 配置docker源
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum makecache fast
## 配置k8s源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
1.2.13 升级内核
### 先升级一下软件包
yum update --exclude=kernel* -y
### 下载内核(4.19以上推荐,默认其实也可以)
curl -o kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm
curl -o kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm
### 安装内核(当前目录只有这2个rpm包)
yum localinstall -y *.rpm
### 更改内核启动顺序
grub2-set-default 0
grub2-mkconfig -o /etc/grub2.cfg
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
### 检查是否加载最新
grubby --default-kernel
### 重启服务器
reboot
1.2.14 安装基础工具
就安装一些用得到的一些工具
yum install -y device-mapper-persistent-data net-tools nfs-utils jq psmisc git lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel epel-release openssh-server socat ipvsadm conntrack telnet ipset sysstat libseccomp
1.2.15 配置内核模块和参数
### 配置需要加载模块
cat > /etc/modules-load.d/k8s.conf << EOF
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
overlay
br_netfilter
EOF
## 开机自动加载
systemctl enable systemd-modules-load.service --now
## 配置内核参数优化
cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
net.ipv4.conf.all.route_localnet = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF
## 生效加载
sysctl --system
### 重启服务,检查模块加载是否正常
reboot
lsmod | grep --color=auto -e ip_vs -e nf_conntrack
1.2.16 安装容器运行时
docker 引擎和 containerd 都安装上
安装 containerd
### 安装
yum install -y containerd.io-1.6.6
### 生成配置文件
containerd config default > /etc/containerd/config.toml
### 修改配置文件
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
sed -i 's#sandbox_image = "k8s.gcr.io/pause:3.6"#sandbox_image="registry.aliyuncs.com/google_containers/pause:3.7"#g' /etc/containerd/config.toml
### 配置镜像加速
sed -i 's#config_path = ""#config_path = "/etc/containerd/certs.d"#g' /etc/containerd/config.toml
mkdir -p /etc/containerd/certs.d/docker.io
cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
server = "https://registry-1.docker.io"
[host."https://xpd691zc.mirror.aliyuncs.com"]
capabilities = ["pull", "resolve", "push"]
EOF
## 启动生效
systemctl daemon-reload ; systemctl enable containerd --now
**安装 crictl **
### 下载二进制包
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.25.0/crictl-v1.25.0-linux-amd64.tar.gz
### 解压
tar -xf crictl-v1.25.0-linux-amd64.tar.gz
### 移动位置
mv crictl /usr/local/bin/
### 配置
cat > /etc/crictl.yaml << EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
### 重启生效
systemctl restart containerd
安装 docker
yum install docker-ce -y
# 配置镜像加速
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://xpd691zc.mirror.aliyuncs.com"]
}
EOF
# 启动服务
systemctl enable docker --now
快照节点
1.3 集群安装配置阶段
1.3.1 安装kubeadm、kubelet、kubectl
### 查看有哪些版本
yum list kubeadm.x86_64 --showduplicates | sort -r
### 安装1.25.9版本
yum install kubeadm-1.25.9 kubelet-1.25.9 kubectl-1.25.9 -y
### 设置 kubelet
### 这个时候启动状态是异常的
systemctl enable kubelet --now
1.3.2 高可用组件安装 keepalived、nginx
安装组件
yum install nginx keepalived nginx-mod-stream -y
配置相关文件
### 三个节点都一样的 /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
}
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
upstream kube-apiserver {
server 192.168.204.10:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.204.11:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.204.12:6443 weight=5 max_fails=3 fail_timeout=30s;
}
server {
listen 16443;
proxy_pass kube-apiserver;
}
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
server {
listen 80 default_server;
server_name _;
location / {
}
}
}
### /etc/keepalived/keepalived.conf 文件
global_defs {
notification_email {
[email protected]
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id NGINX_MASTER
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
}
vrrp_instance VI_1 {
state MASTER
interface ens32 ##注意网卡
virtual_router_id 51
priority 100 # 主是100,其他2个节点是90、80
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.204.13/24
}
track_script {
check_nginx
}
}
### /etc/keepalived/check_nginx.sh 文件内容
#!/bin/bash
#
counter=$(ps -ef |grep nginx | grep sbin | egrep -cv "grep|$$" )
if [ $counter -eq 0 ]; then
service nginx start
sleep 2
counter=$(ps -ef |grep nginx | grep sbin | egrep -cv "grep|$$" )
if [ $counter -eq 0 ]; then
service keepalived stop
fi
fi
## 给执行权限
chmod +x /etc/keepalived/check_nginx.sh
启动验证
nginx -t
systemctl daemon-reload
systemctl enable nginx keepalived --now
ip -4 a
## 尝试停一下主节点的nginx,看是否漂移恢复
systemctl stop nginx
## 验证结果:vip 正常飘逸,恢复后会回到第一台
1.3.3 第一台 master 节点初始化
# 三台设置容器运行时
crictl config runtime-endpoint /run/containerd/containerd.sock
# 在k8s-001进行第一台初始化master节点
kubeadm config print init-defaults > k8s-config.yaml
# 得到文件后修改点
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
#localAPIEndpoint: ##注释
# advertiseAddress: 192.168.204.10 ##注释
# bindPort: 6443 ##注释
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
# name: node ##注释
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers # 改成国内的镜像
kind: ClusterConfiguration
kubernetesVersion: 1.25.9 # 改成对应的版本
# 新增内容 vip地址
controlPlaneEndpoint: 192.168.204.13:16443
networking:
dnsDomain: cluster.local
serviceSubnet: 10.165.0.0/16 # 指定网段,新增内容
podSubnet: 10.166.0.0/16
scheduler: {}
#增加内容
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
初始化
### k8s-001执行
kubeadm init --config=k8s-config.yaml --ignore-preflight-errors=SystemVerification
成功以后的画面
按照要求配置一下
### k8s-001操作
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
1.3.4 其他节点扩容到 master
拷贝证书
### 其他2个节点执行
cd /root && mkdir -p /etc/kubernetes/pki/etcd && mkdir -p ~/.kube/
### k8s-001 节点拷贝证书到k8s-002
scp /etc/kubernetes/pki/ca.crt k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ca.key k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.key k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.pub k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.key k8s-002:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ca.crt k8s-002:/etc/kubernetes/pki/etcd/
scp /etc/kubernetes/pki/etcd/ca.key k8s-002:/etc/kubernetes/pki/etcd/
### k8s-001 节点拷贝证书到k8s-003
scp /etc/kubernetes/pki/ca.crt k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/ca.key k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.key k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/sa.pub k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/front-proxy-ca.key k8s-003:/etc/kubernetes/pki/
scp /etc/kubernetes/pki/etcd/ca.crt k8s-003:/etc/kubernetes/pki/etcd/
scp /etc/kubernetes/pki/etcd/ca.key k8s-003:/etc/kubernetes/pki/etcd/
加入集群
### 根据初始化第一台master节点的信息,加入master的命令是
kubeadm join 192.168.204.13:16443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:02e4ac4dac8089fd8b15f164fa079b450050e1ec238f58a11338411789100bf0 \
--control-plane
### 在另外2个节点都执行一下即可,执行完记得也要配置一下
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
加入集群的 token 过期了怎么办?
kubeadm token create --print-join-command
kubeadm init phase upload-certs --upload-certs
核实集群状态
### 任意一台执行
kubectl get nodes
1.3.5 安装网络插件 calico
### k8s-001执行即可
mkdir /opt/k8s-yaml
cd /opt/k8s-yaml
### 提前准备好了相关文件和镜像包:calico.tar.gz和calico.yaml
### 将镜像拷贝到其他节点
scp calico.tar.gz k8s-002:/root
scp calico.tar.gz k8s-003:/root
### 在每个节点对应位置导入镜像
ctr -n=k8s.io images import calico.tar.gz
### 在k8s-001节点启动
kubectl apply -f calico.yaml
再看下集群状态
kubectl get nodes
1.3.6 污点去除
Master 节点默认是不允许部署非系统的 pod,我们可以通过删除污点的方式运行部署
查看污点
kubectl describe node | grep -B 3 Taints
删除污点
kubectl taint node -l node-role.kubernetes.io/control-plane= node-role.kubernetes.io/control-plane:NoSchedule-
验证一下
在没有去除污点的时候,这个pod一直是有问题的
去除污点以后就好了
通过查看这个资源能够看到效果
1.3.7 配置 etcd 高可用
修改每个节点的 /etc/kubernetes/manifests/etcd.yaml
## 每个节点的文件这个---initial-cluster内容保持三节点配置,改了之后好像会自动重启k8s集群
--initial-cluster=k8s-003=https://192.168.204.12:2380,k8s-002=https://192.168.204.11:2380,k8s-001=https://192.168.204.10:2380
## 再重启kubelet
systemctl restart kubelet
验证
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.4-0 etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt member list
显示这个就是说明配置正确的
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.4-0 etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints=https://192.168.204.10:2379,https://192.168.204.11:2379,https://192.168.204.12:2379 endpoint health --cluster
1.4 插件安装及优化阶段
1.4.1 kubectl 命令补全功能
### 任意一台执行
yum install bash-completion
source /usr/share/bash-completion/bash_completion
bash
type _init_completion
kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null
source ~/.bashrc
bash
1.4.2 安装 metrics
有关 metrics-server 地址:https://github.com/kubernetes-sigs/metrics-server#readme
Metrics Server是 k8s 内置自动缩放管道的可扩展、高效的容器资源度量源。
Metrics Server 从 Kubelets 收集资源度量,并通过Metrics API在Kubernetes apiserver中公开这些度量,供Horizontal Pod Autoscaler和Vertical Pod Autocaler使用。kubectl top还可以访问Metrics API,从而更容易调试自动缩放管道。
准备 yaml
文件
##### metrics-server.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --kubelet-insecure-tls
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/metrics-server:v0.4.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
部署一下
### k8s-001执行即可
kubectl apply -f metrics-server.yaml
查看一下状态
kubectl get pod -n kube-system
验证一下
kubectl top nodes
kubectl top pods
1.4.3 Dashboard UI 安装
准备的 yaml 文件
直接下载也是可以的:https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
-
这个服务的镜像不好拉取,我自己做成了阿里云的仓库进行拉取了(这是一共公开的仓库)
-
需要的是这2个镜像
- kubernetesui/metrics-scraper:v1.0.8
- kubernetesui/dashboard:v2.7.0
-
我换成了阿里云的
- registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/dashboard:v2.7.0
- registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/metrics-scraper:v1.0.8
# 我们还得修改一下这个服务的暴露方式,这样我们就可以外部访问了
# 要将配置文件里面的 Service 暴露方式设置成 NodePort 方式,这样我们外部就可以直接访问了
# 在32行位置
type: NodePort
# 再把镜像改一下
sed -i 's#kubernetesui/metrics-scraper:v1.0.8#registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/metrics-scraper:v1.0.8#g' recommended.yaml
sed -i 's#kubernetesui/dashboard:v2.7.0#registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/dashboard:v2.7.0#g' recommended.yaml
运行一下
kubectl apply -f recommended.yaml
看下状态
kubectl get pod -A
配置访问
### dashboard-admin.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
namespace: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kubernetes-dashboard
生效一下
kubectl delete -f dashboard-admin.yaml
kubectl create -f dashboard-admin.yaml
生成登陆 token
这个版本的好像和之前的有所改变,需要这样操作才能获取到登陆 token
- 创建服务账号
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
- 集群角色绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
- 生成 token
kubectl -n kubernetes-dashboard create token admin-user
登陆验证
##看一下访问端口
kubectl get svc -n kubernetes-dashboard
## 是宿主机的32190端口
试试浏览器访问:https://192.168.204.13:32190/ (vip 地址也可)
输入上面生成的 token 即可访问
1.5 集群验证阶段
1.5.1 节点验证
kubectl get nodes
- 节点正常处于
Ready
状态 - 版本和安装无出入
1.5.2 Pod 验证
kubectl get pods -A
- 状态肯定是需要是
Running
状态 - Pod 准备是否就绪,处于
1/1
(前面数字和后面数字保持一致) - 对于重启次数,不用太纠结这个,虚拟机重启都会导致这个重启次数
1.5.3 k8s 网段验证
kubectl get svc
kubectl get pod -A -owide
-
核实一下对于的网段是否和我们规划的一致,是否存在冲突
-
对于网络用的宿主机 IP 地址,是因为
pod
网络模式用的主机模式
1.5.4 创建资源验证
### 我们使用国内的一个debug工具镜像进行验证即可,包括后续的验证,准备这么一个yaml文件
apiVersion: apps/v1
kind: Deployment
metadata:
name: debug-tools
labels:
app: debug-tools
spec:
replicas: 3
selector:
matchLabels:
app: debug-tools
template:
metadata:
labels:
app: debug-tools
spec:
containers:
- name: debug-tools
image: registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/debug-tools:latest
command: ["/bin/sh","-c","sleep 3600"]
运行一下
kubectl apply -f demo.yaml
查看一下是否正常部署
- 副本数
pod
状态- 容器内部
删除资源是否正常
kubectl delete -f demo.yaml
- 资源配置问题,稍微比较慢,只要资源能够正常创建删除即可
1.5.5 pod 解析验证
验证是否正确解析 service
# 从上面的pod进行验证
# 同空间(容器内部验证)
nslookup kubernetes
# 跨空间
nslookup kube-dns.kube-system
- 有 IP 返回即可
1.5.6 节点访问验证
# 宿主机必须是能够访问k8s svc 443 和 kube-dns的53端口
# 每个节点
curl -k https://10.165.0.1:443
curl 10.165.0.10:53
- 有这些说明事正常通信的
1.5.7 Pod 之间的通信
同机器上的pod和不同机器上的都要去验证一下
准备2个ymal
第一个
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
第二个
apiVersion: apps/v1
kind: Deployment
metadata:
name: debug-tools
namespace: kube-system
labels:
app: debug-tools
spec:
replicas: 1
selector:
matchLabels:
app: debug-tools
template:
metadata:
labels:
app: debug-tools
spec:
containers:
- name: debug-tools
image: registry.cn-hangzhou.aliyuncs.com/k8s_whale_images/debug-tools:latest
command: ["/bin/sh","-c","sleep 3600"]
跑起来
- 我们跑了2个容器,在不同的空间下
验证 Pod 之间通信
查看一下各 pod 地址
实际去 ping 验证一下:
# 不同空间
kubectl exec -it debug-tools-7b466cf8dc-xmhl7 -n kube-system ping 10.166.6.10
相同的空间
到现在,应该算是一个比较完美的过程了