一、背景
为了保证应用服务(边缘端盒子)高可用及数据安全,设计盒子《主备应用(边缘盒子)部署&切换方案》,可实现主(服务器)盒子宕机或故障时可自动切换至备盒子(服务器),保证业务连续性。
二、方案总体部署
应用服务为相互独立k3s集群,mysql主机部署双主热备,redis部署主从,可实现主从自动切换。
三、主要设计及切换思路
3.1设计思路
- 业务运行在主服务器上,备份服务器不跑业务,但保持数据与主份同步,在主份失效时可快速进行替换。
- mysql双主热备、redis主从复制(通过脚本实现主从切换)
- 通过 keepalived集群来实现主备盒子ip的路由漂移,对应用及业务来讲只需要对着一个固定的内网vip即可,无需关注服务器不可用带来的ip变更。
- mysql、redis配置vip,当主服务器故障时切换至备服务器,vip漂移,实现mysql、redis切换。
- 为了避免数据冲突,备服务器,日常应用副本数为0,当vip切换至备服务器时,使用脚本自动调整副本数为1.
- 模型、图片数据备份:主备两个机器都部署lsyncd 服务,备服务器默认关闭状态,只有备服务器升级为主服务器之后才会打开该服务,同时停掉备服务器lsyncd服务(存储路径:/data/application/data/paas-datatransfer-server)
- 涉及到到的三个ip均为项目分配的固定内网ip,需注意资源需求
- keepalived 默认ip漂移策略为主盒子,ip的切换通过前置脚本触发,脚本可检测服务器可用状态、k8s可用状态、应用服务故障率等指标
- mysql、redis 涉及到双主互备、主从模式的搭建,实际部署方式由k8s调整为vm部署,服务状态监控以及可靠性保证通过配套脚本实现
- 备份盒子默认只启动vm部署的组件,保证数据和主盒子是实时同步,k8s内应用服务和中间件默认关闭状态,识别主盒子故障ip切换以后自动开启应用服务,保证服务里缓存数据为最新
四、部署信息
4.1 资源清单
ip | 服务器 | 作用 |
|
192.168.101.77 | 192.168.101.77 | 主服务器 | 首次部署业务跑主服务器 |
192.168.101.78 | 192.168.101.78 | 备服务器 | 平时不跑应用,只做Mysql、Redis主从 |
192.168.101.90 |
|
| 虚拟ip,根据服务器状态漂移,目前挂77 |
4.2 部署清单
序号 | 部署组件 | 部署类型 | 备注 |
1 | keepalived | 服务器部署 | 监测服务,实现vip漂移 |
2 | mysql | 用vip访问 | |
3 | redis | 用vip访问 | |
4 | k3s | 用vip访问 | |
5 | PaaS服务 | 容器部署,k3s编排管理 | nacos、xxlj、exmq |
6 | 应用服务 | 部署所需的应用服务 |
五、部署操作
5.1 部署keepalived
5.1.1 安装keepalived
安装命令
yum install keepalived -y
5.1.1.1Master主服务器配置
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id master-jifang77
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface eno1
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.101.90
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
}
wq!保存并退出
备注:如果访问ip和设备网不在一个网段,即有多个ip,virtual_ipaddress 可以配置多个 ip
5.1.1.2 Backup备服务器
同样,配置备盒子
! Configuration File for keepalived
global_defs {
router_id back-jifang78
}
vrrp_instance VI_1 {
state BACKUP
interface eno1
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.101.90
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
}
5.1.2切换脚本
主备都执行
vim /etc/keepalived/notify.sh
#!/bin/bash
# 切换时执行脚本
#用于主从切换时关闭活启动服务
function update_app_replicas() {
local status=$1
case ${status} in
"master")
replica=1
sleep_time=30
;;
"backup")
replica=0
sleep_time=15
;;
*)
echo "invalid parameter..."
exit 1
;;
esac
while true;do
kubectl cluster-info
if [ $? -ne 0 ];then
sleep 10
else
break
fi
done
sleep ${sleep_time}
for namespace in $(kubectl get namespaces app paas | awk 'NR>1{print $1}');do
kubectl get deployments.apps -n ${namespace} |awk 'NR>1{print $1}'| (
xargs kubectl scale --replicas=${replica} deployment -n ${namespace})
done
}
function switch_redis(){
# systemctl stop redis
if systemctl is-active --quiet keepalived; then
echo "The service keepalived is running"
systemctl restart redis
else
echo "The service keepalived is not running."
fi
}
function send_dingding() {
local status=$1
current_time=$(date +'%Y-%m-%d %H:%M:%S.%2N %z')
host_name=$(hostname)
curl 'https://oapi.dingtalk.com/robot/send?access_token=40806454891df2ff057cc8885a99e2d403cc1ae7f561464064c6ecf5506ec28f' \
-H 'Content-Type: application/json' \
-d "{
\"msgtype\": \"markdown\",
\"markdown\": {
\"title\": \"盒子主从切换告警\",
\"text\": \"## <font color=LightSlateBlue>盒子主从切换</font> \n > - 时间:${current_time}\n > - 主机名:${host_name} \n > - 切换状态: 边缘测盒子切换为${status} \n \n**边缘测盒子发生了主从切换注意观察服务状态**\"
},
\"at\": {
\"atMobiles\": [13259490898],
\"isAtAll\": false
}
}"
}
main(){
switch_redis
update_app_replicas $@
send_dingding $@
}
main $@
#给脚本授权 chmod a+x /etc/keepalived/notify.sh
5.1.3 修改Service
vim /usr/lib/systemd/system/keepalived.service # 添加restart策略
[Unit]
Description=LVS and VRRP High Availability Monitor
After=syslog.target network-online.target
After=k3s.service
[Service]
Type=forking
PIDFile=/var/run/keepalived.pid
KillMode=process
EnvironmentFile=-/etc/sysconfig/keepalived
ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
[Install]
WantedBy=multi-user.target
5.1.4 启动并设置开机
systemctl daemon-reload && systemctl start keepalived && systemctl enable keepalived
5.2 部署Redis
5.2.1安装redis
wget http://download.redis.io/releases/redis-6.2.6.tar.gz # 将安装包上次到服务器后,切换目录至安装包同级目录 # 解压 tar -zxvf redis-6.2.6.tar.gz # 安装编译环境 yum -y install gcc automake autoconf libtool make # 切换到源码根目录 cd redis-6.2.6 # 编译 make # 安装 make PREFIX=/usr/local/redis install # 将默认配置文件复制到安装目录 cp redis.conf /usr/local/redis/ # 如编译失败后,再次编译前请先清除残留文件 # CentOS # make clean
修改配置-主从都执行
cd /usr/local/redis/ # 编辑文件 vi redis.conf # 修改内容如下: # 允许其他设备远程连接redis bind 0.0.0.0 # 允许后台启动 daemonize yes # 修改日志存放目录,不需要日志可改为logfile "/dev/null" logfile "/var/log/redis_6379.log" # 数据持久化目录 dir /data/redis-host # 设置密码,如: requirepass Redis@12345678
复制主从Redis配置文件,并配置主从
cp /usr/local/redis/redis.conf /usr/local/redis/master.conf cp /usr/local/redis/redis.conf /usr/local/redis/slave.conf#将redis.conf文件复制两份分别为slave.conf和master.conf cd /usr/local/redis/ cp redis.conf slave.conf cp redis.conf master.conf #分别对两台服务器的slave.conf文件做如下修改 编辑A服务器slave.conf # 设置B服务器为主服务 replicaof 192.168.101.77 6379 # 配置连接密码,即主服务的密码 masterauth Redis@12345678 编辑B服务器slave.conf # 设置A服务器为主服务 replicaof 192.168.88.75 6379 # 配置连接密码,即主服务的密码 masterauth Redis@12345678
5.2.2 Redis切换脚本
通过此脚本可实现主从服务器切换时,Redis实现主从切换
vim /usr/local/redis/getstatu.sh #!/bin/bash #获取当前redis主从配置 SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" keeplived_conf="/etc/keepalived/keepalived.conf" redis_conf="${SCRIPT_DIR}/redis.conf" function get_interface_info(){ if [ -f ${keeplived_conf} ];then interface=$(cat ${keeplived_conf} | grep interface | sed 's/.* \([^ ]*\)$/\1/') virip=$(awk '/virtual_ipaddress/ {getline; print $1}' ${keeplived_conf}) else echo "/etc/keepalived/keepalived.conf does not exist" fi } function get_interface_ip(){ if ip addr show ${interface} | grep -q "inet ${virip}"; then redis_status=master else redis_status=slave fi } function make_redis_conf(){ if [ ${redis_status} == "master" ];then cp -f ${SCRIPT_DIR}/master.conf ${redis_conf} elif [ ${redis_status} == "slave" ];then cp -f ${SCRIPT_DIR}/slave.conf ${redis_conf} else echo "Redis status value error!" fi } function main(){ #缓冲时间等待keepalievd sleep 2 get_interface_info get_interface_ip make_redis_conf } main#给脚本授权 chmod a+x /usr/local/redis/getstatu.sh
5.2.3 Redis修改Service
vim /etc/systemd/system/redis.service [Unit] Description=redis Documentation=https://redis.io/documentation/ After=network.target After=keepalived.service [Service] Type=forking #PIDFile=/var/run/redis_6379.pid ExecStartPre=/usr/local/redis/getstatu.sh ExecStart=/usr/local/redis/bin/redis-server /usr/local/redis/redis.conf ExecReload=/bin/kill -s HUP $MAINPID ExecStop=/usr/local/redis/bin/redis-cli -a redis@12345678 shutdown PrivateTmp=true [Install] WantedBy=multi-user.targetsystemctl daemon-reload && systemctl start redis && systemctl enable redis
5.3 部署Mysql
5.3.1 检查
检查是否有自带mysql
mysql --version rpm -qa | grep mysql如果有,需要先卸载
systemctl stop mysqld yum remove mysql84-community-release-el9.noarch yum remove mysql-community-server yum remove mysql-community-client yum remove mysql-community-libs yum remove mysql rm -rf /var/lib/mysql
5.3.2 安装mysql(centos 系统)
wget https://dev.mysql.com/get/mysql84-community-release-el7-1.noarch.rpmyum install mysql84-community-release-el7-1.noarch.rpm 检查 MySQL Yum 仓库是否已 通过以下命令成功添加(对于 启用 DNF 的系统,替换 YUM 带有 dnf 的命令): yum repolist enabled | grep "mysql.*-community.*" yum repolist all | grep mysql禁用8.4,启用8.0
yum install -y yum-utils yum-config-manager --disable mysql-8.4-lts-community yum-config-manager --disable mysql-tools-8.4-lts-community yum-config-manager --enable mysql80-community yum-config-manager --enable mysql-tools-communityyum install mysql-community-server
详见官方文档
MySQL :: MySQL 8.0 参考手册 :: 2.5.1 使用 MySQL Yum 仓库在 Linux 上安装 MySQL
如果要换源
sed -e 's|^mirrorlist=|#mirrorlist=|g' \ -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \ -i.bak \ /etc/yum.repos.d/rocky-*.repo dnf makecache
5.3.3 安装mysql(rocky linux)
mysql --version rpm -qa | grep mysql
卸载原有的mysql
yum remove mysql-common*安装
yum install https://dev.mysql.com/get/mysql84-community-release-el9-1.noarch.rpm yum repolist enabled | grep "mysql.*-community.*" yum repolist all | grep mysql dnf config-manager --disable mysql-8.4-lts-community dnf config-manager --disable mysql-tools-8.4-lts-community dnf config-manager --enable mysql80-community dnf config-manager --enable mysql-tools-community yum repolist enabled | grep mysql yum module disable mysql yum install mysql-community-server
5.3.4 修改配置文件
vim /ect/my.conf
[mysqld] # # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # # Remove the leading "# " to disable binary logging # Binary logging captures changes between backups and is enabled by # default. It's default setting is log_bin=binlog # disable_log_bin # # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M # # Remove leading # to revert to previous value for default_authentication_plugin, # this will increase compatibility with older clients. For background, see: # https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin # default-authentication-plugin=mysql_native_password datadir=/data/mysql-host socket=/var/lib/mysql/mysql.sock log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION' secure-file-priv= NULL symbolic-links=0 max_allowed_packet = 1073741824 #允许最大连接数 max_connections = 1000 ### 排序缓冲区大小,当使用非索引排序时会用到.默认256K #sort_buffer_size = 134217728 ### 允许连接失败的次数。这是为了防止有人从该主机试图攻击数据库系统 #max_connect_errors=10 ### 服务端使用的字符集默认为UTF8 character-set-server=utf8mb4 ### 默认使用mysql_native_password插件认证 #default_authentication_plugin=mysql_native_password ### 默认时区改为东八区 default-time-zone='+8:00' # # binlog_format=MIXED ##binlog 时间保存3天 binlog_expire_logs_seconds=259200 ### myisam引擎排序缓冲区大小,当使用非索引排序时会用到.默认8M myisam_sort_buffer_size=8388608 ###innodb缓冲池大小,按pod内存上限的75%配置 innodb_buffer_pool_size=12073741824 #默认12G #根据现场硬件配置调整 ###告诉innodb正确的磁盘能力,使innodb_max_dirty_pages_pct小于75% innodb_io_capacity=10000 innodb_io_capacity_max=10000 query_alloc_block_size = 14336 query_prealloc_size = 815104 # ------主节点配置------- # # # 当前节点在集群中的唯一标识 server-id=78 # # # 开启bin-log日志,并为bin-log日志取个前缀名(有默认值可不写) log-bin=mysql-bin-log # # # 同步复制时过滤的库(主要将一些不需要备份/同步库写进来) # # # 也可以通过binlog-do-db=xx1,xx2... 来指定要复制的目标库 binlog-ignore-db=mysql # # # 设置单个binlog日志文件的最大容量 max_binlog_size=1024M # # # # # ------GTID配置------- # # # 开启GTID复制 gtid_mode=on # # # 跳过一些可能导致执行出错的SQL语句 enforce-gtid-consistency=on # # # # # ------从节点配置------- # # # 开启relay-log日志(同样可以指定前缀名) relay_log=mysql-relay-log # # # 开启存储过程、函数、触发器等内容的同步功能 log_bin_trust_function_creators=true # # # 同步执行跳过一些错误码(防止同步写入时出现错误导致复制中断) slave_skip_errors=1062,1032,1053,1236,1050 # # # # # ------自增序列配置------- # # # 设置自增初始值为1 auto_increment_offset=2 # # # 设置自增步长为2,自增序列为{1、3、5、7、9.....} auto_increment_increment=2
第二个节点的配置信息和第一个节点的信息,只有两点不同:
• server-id=2:这个是各节点在集群中的唯一标识,不能重复。 • auto_increment_offset=2:自增初始值从1变为2,最终效果会变为:
5.3.5 配置账号密码
systemctl start mysqld grep 'temporary password' /var/log/mysqld.log mysql -uroot -p ALTER USER 'root'@'localhost' IDENTIFIED BY 'Admin@123'; SHOW VARIABLES LIKE 'validate_password%'; SET GLOBAL validate_password.policy=LOW; UPDATE mysql.user SET Host='%' WHERE User='root' AND Host='localhost'; FLUSH PRIVILEGES; ALTER USER 'root'@'%' IDENTIFIED BY 'Admin@123';
5.3.6 配置主从
#登录mysql mysql -uroot -p #在78上执行将79设为自己的主节点 change master to master_host='192.168.202.79', master_user='root', master_password='Admin@123', master_port=3306, master_auto_position=1;#启动从
start slave;
#在79上执行将78设为自己的主节点 change master to master_host='192.168.101.78', master_user='root', master_password='Admin@123', master_port=3306, master_auto_position=1;#启动从
start slave;
#查看状态
SHOW SLAVE STATUS\G
这俩都是yes就正常
5.3 部署K3s及服务
K3s | 轻量级Kubernetes | 物联网及边缘计算K8S解决方案 | Rancher
5.4 部署自己的应用服务
5.5 验证
通过重启keepalived来切换验证。