Bootstrap

主备应用部署&切换方案

一、背景 

为了保证应用服务(边缘端盒子)高可用及数据安全,设计盒子《主备应用(边缘盒子)部署&切换方案》,可实现主(服务器)盒子宕机或故障时可自动切换至备盒子(服务器),保证业务连续性。

二、方案总体部署

应用服务为相互独立k3s集群,mysql主机部署双主热备,redis部署主从,可实现主从自动切换。

三、主要设计及切换思路

3.1设计思路

  • 业务运行在主服务器上,备份服务器不跑业务,但保持数据与主份同步,在主份失效时可快速进行替换。
  • mysql双主热备、redis主从复制(通过脚本实现主从切换)
  • 通过 keepalived集群来实现主备盒子ip的路由漂移,对应用及业务来讲只需要对着一个固定的内网vip即可,无需关注服务器不可用带来的ip变更。
  • mysql、redis配置vip,当主服务器故障时切换至备服务器,vip漂移,实现mysql、redis切换。
  • 为了避免数据冲突,备服务器,日常应用副本数为0,当vip切换至备服务器时,使用脚本自动调整副本数为1.
  • 模型、图片数据备份:主备两个机器都部署lsyncd 服务,备服务器默认关闭状态,只有备服务器升级为主服务器之后才会打开该服务,同时停掉备服务器lsyncd服务(存储路径:/data/application/data/paas-datatransfer-server)

3.2资源需求及部署
  • 涉及到到的三个ip均为项目分配的固定内网ip,需注意资源需求
  • keepalived 默认ip漂移策略为主盒子,ip的切换通过前置脚本触发,脚本可检测服务器可用状态、k8s可用状态、应用服务故障率等指标
  • mysql、redis 涉及到双主互备、主从模式的搭建,实际部署方式由k8s调整为vm部署,服务状态监控以及可靠性保证通过配套脚本实现
  • 备份盒子默认只启动vm部署的组件,保证数据和主盒子是实时同步,k8s内应用服务和中间件默认关闭状态,识别主盒子故障ip切换以后自动开启应用服务,保证服务里缓存数据为最新

四、部署信息

4.1 资源清单

ip

服务器

作用

 

192.168.101.77

192.168.101.77

主服务器

 首次部署业务跑主服务器

192.168.101.78

192.168.101.78

备服务器

 平时不跑应用,只做Mysql、Redis主从

192.168.101.90

 

 

虚拟ip,根据服务器状态漂移,目前挂77

4.2 部署清单

序号

部署组件

部署类型

备注

1

keepalived

服务器部署

监测服务,实现vip漂移

2

mysql

用vip访问

3

redis

用vip访问

4

k3s

用vip访问

5

PaaS服务

容器部署,k3s编排管理

nacos、xxlj、exmq

6

应用服务

部署所需的应用服务

五、部署操作

5.1 部署keepalived

5.1.1 安装keepalived

安装命令

yum install keepalived -y
5.1.1.1Master主服务器配置
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived


global_defs {
   router_id master-jifang77
}


vrrp_instance VI_1 {
    state BACKUP
    nopreempt
    interface eno1
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.101.90
    }
    notify_master "/etc/keepalived/notify.sh master"
    notify_backup "/etc/keepalived/notify.sh backup"
}

wq!保存并退出

备注:如果访问ip和设备网不在一个网段,即有多个ip,virtual_ipaddress 可以配置多个 ip

5.1.1.2 Backup备服务器

同样,配置备盒子

! Configuration File for keepalived


global_defs {
   router_id back-jifang78
}


vrrp_instance VI_1 {
    state BACKUP
    interface eno1
    virtual_router_id 51
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.101.90
    }
    notify_master "/etc/keepalived/notify.sh master"
    notify_backup "/etc/keepalived/notify.sh backup"


}

5.1.2切换脚本

主备都执行

vim /etc/keepalived/notify.sh
#!/bin/bash
# 切换时执行脚本

#用于主从切换时关闭活启动服务
function update_app_replicas() {
    local status=$1


    case ${status} in
        "master")
            replica=1
            sleep_time=30
            ;;
        "backup")
            replica=0
            sleep_time=15
            ;;
        *)
            echo "invalid parameter..."
            exit 1
            ;;
    esac
    while true;do
        kubectl cluster-info
        if [ $? -ne 0 ];then
            sleep 10
        else
            break
        fi
    done
    sleep ${sleep_time}
    for namespace in $(kubectl get namespaces app paas | awk 'NR>1{print $1}');do
        kubectl get deployments.apps -n ${namespace} |awk 'NR>1{print $1}'| (
            xargs kubectl scale --replicas=${replica} deployment -n ${namespace})
    done
}


function switch_redis(){
#    systemctl stop redis
    if systemctl is-active --quiet keepalived; then
        echo "The service keepalived is running"
        systemctl restart redis
    else
        echo "The service keepalived is not running."
    fi
}


function send_dingding() {
    local status=$1
    current_time=$(date +'%Y-%m-%d %H:%M:%S.%2N %z')
    host_name=$(hostname)
    curl 'https://oapi.dingtalk.com/robot/send?access_token=40806454891df2ff057cc8885a99e2d403cc1ae7f561464064c6ecf5506ec28f' \
        -H 'Content-Type: application/json' \
        -d "{
            \"msgtype\": \"markdown\",
            \"markdown\": {
                \"title\": \"盒子主从切换告警\",
                \"text\": \"## <font color=LightSlateBlue>盒子主从切换</font> \n > - 时间:${current_time}\n > - 主机名:${host_name} \n > - 切换状态: 边缘测盒子切换为${status} \n \n**边缘测盒子发生了主从切换注意观察服务状态**\"
                 },
            \"at\": {
                \"atMobiles\": [13259490898],
                \"isAtAll\": false
                 }
            }"
}




main(){
    switch_redis
    update_app_replicas $@
    send_dingding $@
}


main $@
#给脚本授权
chmod a+x /etc/keepalived/notify.sh
 

5.1.3 修改Service

vim /usr/lib/systemd/system/keepalived.service
# 添加restart策略        
[Unit]
Description=LVS and VRRP High Availability Monitor
After=syslog.target network-online.target
After=k3s.service


[Service]
Type=forking
PIDFile=/var/run/keepalived.pid
KillMode=process
EnvironmentFile=-/etc/sysconfig/keepalived
ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
Restart=always


[Install]
WantedBy=multi-user.target  

5.1.4 启动并设置开机

systemctl daemon-reload && systemctl start keepalived && systemctl enable keepalived

5.2 部署Redis

5.2.1安装redis

我这里是Redis6.2.6,可以根据自己的版本选择
主从服务器都执行
wget http://download.redis.io/releases/redis-6.2.6.tar.gz

# 将安装包上次到服务器后,切换目录至安装包同级目录
# 解压
tar -zxvf redis-6.2.6.tar.gz
# 安装编译环境
yum -y install gcc automake autoconf libtool make
# 切换到源码根目录
cd redis-6.2.6
 # 编译
make
# 安装
 
make PREFIX=/usr/local/redis install
 # 将默认配置文件复制到安装目录
 
cp redis.conf /usr/local/redis/
 # 如编译失败后,再次编译前请先清除残留文件
# CentOS
# make clean

修改配置-主从都执行

cd /usr/local/redis/
# 编辑文件
vi redis.conf
# 修改内容如下:
# 允许其他设备远程连接redis
 
bind 0.0.0.0
# 允许后台启动
daemonize yes
 
# 修改日志存放目录,不需要日志可改为logfile "/dev/null"
 
logfile "/var/log/redis_6379.log"
 
# 数据持久化目录
 
dir /data/redis-host
# 设置密码,如:
requirepass Redis@12345678

 复制主从Redis配置文件,并配置主从

cp /usr/local/redis/redis.conf  /usr/local/redis/master.conf  
cp /usr/local/redis/redis.conf  /usr/local/redis/slave.conf  
#将redis.conf文件复制两份分别为slave.conf和master.conf
cd /usr/local/redis/


cp redis.conf slave.conf
cp redis.conf master.conf
#分别对两台服务器的slave.conf文件做如下修改


编辑A服务器slave.conf


# 设置B服务器为主服务
replicaof 192.168.101.77 6379
# 配置连接密码,即主服务的密码
masterauth Redis@12345678


编辑B服务器slave.conf


# 设置A服务器为主服务
 
replicaof 192.168.88.75 6379
 
# 配置连接密码,即主服务的密码
 
masterauth Redis@12345678

 

5.2.2 Redis切换脚本

通过此脚本可实现主从服务器切换时,Redis实现主从切换

vim /usr/local/redis/getstatu.sh

#!/bin/bash


#获取当前redis主从配置


SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
keeplived_conf="/etc/keepalived/keepalived.conf"
redis_conf="${SCRIPT_DIR}/redis.conf"


function get_interface_info(){
    if [ -f ${keeplived_conf} ];then
        interface=$(cat ${keeplived_conf} | grep interface | sed  's/.* \([^ ]*\)$/\1/')
        virip=$(awk '/virtual_ipaddress/ {getline; print $1}' ${keeplived_conf})
    else
       echo "/etc/keepalived/keepalived.conf does not exist"
    fi   
} 


function get_interface_ip(){
   if ip addr show ${interface} | grep -q "inet ${virip}"; then
      redis_status=master
   else
      redis_status=slave
   fi
}


function make_redis_conf(){
    if [ ${redis_status} == "master" ];then
        cp -f ${SCRIPT_DIR}/master.conf ${redis_conf}
    elif [ ${redis_status} == "slave" ];then
        cp -f ${SCRIPT_DIR}/slave.conf ${redis_conf} 
    else
        echo "Redis status value error!"
    fi
}


function main(){
    #缓冲时间等待keepalievd
    sleep 2
    get_interface_info
    get_interface_ip
    make_redis_conf
}


main
#给脚本授权
chmod a+x /usr/local/redis/getstatu.sh

5.2.3 Redis修改Service

vim /etc/systemd/system/redis.service


[Unit]
Description=redis
Documentation=https://redis.io/documentation/
After=network.target
After=keepalived.service
 
[Service]
Type=forking
#PIDFile=/var/run/redis_6379.pid
ExecStartPre=/usr/local/redis/getstatu.sh
ExecStart=/usr/local/redis/bin/redis-server /usr/local/redis/redis.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/local/redis/bin/redis-cli -a redis@12345678 shutdown
PrivateTmp=true
 
[Install]
WantedBy=multi-user.target
systemctl daemon-reload && systemctl start redis && systemctl enable redis

5.3 部署Mysql

5.3.1 检查

检查是否有自带mysql

mysql --version
rpm -qa | grep mysql

如果有,需要先卸载

systemctl stop mysqld


yum remove mysql84-community-release-el9.noarch
yum remove mysql-community-server
yum remove mysql-community-client
yum remove mysql-community-libs
yum remove mysql


rm -rf /var/lib/mysql

5.3.2 安装mysql(centos 系统)

wget https://dev.mysql.com/get/mysql84-community-release-el7-1.noarch.rpm
yum install mysql84-community-release-el7-1.noarch.rpm


检查 MySQL Yum 仓库是否已 通过以下命令成功添加(对于 启用 DNF 的系统,替换 YUM 带有 dnf 的命令):
 yum repolist enabled | grep "mysql.*-community.*"


 yum repolist all | grep mysql

禁用8.4,启用8.0

 

yum install -y yum-utils


yum-config-manager --disable mysql-8.4-lts-community
yum-config-manager --disable mysql-tools-8.4-lts-community


yum-config-manager --enable mysql80-community
yum-config-manager --enable mysql-tools-community
yum install mysql-community-server

详见官方文档

MySQL :: MySQL 8.0 参考手册 :: 2.5.1 使用 MySQL Yum 仓库在 Linux 上安装 MySQL

 

 

如果要换源

 

sed -e 's|^mirrorlist=|#mirrorlist=|g' \
    -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \
    -i.bak \
    /etc/yum.repos.d/rocky-*.repo


dnf makecache

 

5.3.3 安装mysql(rocky linux)

mysql --version
rpm -qa | grep mysql

 

卸载原有的mysql

yum remove  mysql-common*

安装

yum install https://dev.mysql.com/get/mysql84-community-release-el9-1.noarch.rpm


yum repolist enabled | grep "mysql.*-community.*"
yum repolist all | grep mysql
dnf config-manager --disable mysql-8.4-lts-community
dnf config-manager --disable mysql-tools-8.4-lts-community
dnf config-manager --enable mysql80-community
dnf config-manager --enable mysql-tools-community
yum repolist enabled | grep mysql
yum module disable mysql
yum install mysql-community-server

5.3.4 修改配置文件

vim /ect/my.conf

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove the leading "# " to disable binary logging
# Binary logging captures changes between backups and is enabled by
# default. It's default setting is log_bin=binlog
# disable_log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
#
# Remove leading # to revert to previous value for default_authentication_plugin,
# this will increase compatibility with older clients. For background, see:
# https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin
# default-authentication-plugin=mysql_native_password


datadir=/data/mysql-host
socket=/var/lib/mysql/mysql.sock


log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'
secure-file-priv= NULL
symbolic-links=0
max_allowed_packet = 1073741824
#允许最大连接数
max_connections = 1000
### 排序缓冲区大小,当使用非索引排序时会用到.默认256K
#sort_buffer_size =  134217728
### 允许连接失败的次数。这是为了防止有人从该主机试图攻击数据库系统
#max_connect_errors=10
### 服务端使用的字符集默认为UTF8
character-set-server=utf8mb4
### 默认使用mysql_native_password插件认证
#default_authentication_plugin=mysql_native_password
### 默认时区改为东八区
default-time-zone='+8:00'
#
#
binlog_format=MIXED
##binlog 时间保存3天 
binlog_expire_logs_seconds=259200
### myisam引擎排序缓冲区大小,当使用非索引排序时会用到.默认8M
myisam_sort_buffer_size=8388608
###innodb缓冲池大小,按pod内存上限的75%配置
innodb_buffer_pool_size=12073741824   #默认12G  #根据现场硬件配置调整 
###告诉innodb正确的磁盘能力,使innodb_max_dirty_pages_pct小于75%
innodb_io_capacity=10000
innodb_io_capacity_max=10000
query_alloc_block_size = 14336
query_prealloc_size = 815104






# ------主节点配置-------
# # # 当前节点在集群中的唯一标识
server-id=78
# # # 开启bin-log日志,并为bin-log日志取个前缀名(有默认值可不写)
log-bin=mysql-bin-log
# # # 同步复制时过滤的库(主要将一些不需要备份/同步库写进来)
# # # 也可以通过binlog-do-db=xx1,xx2... 来指定要复制的目标库
binlog-ignore-db=mysql
# # # 设置单个binlog日志文件的最大容量
max_binlog_size=1024M
# #  
# #  # ------GTID配置-------
# #  # 开启GTID复制
gtid_mode=on
# #  # 跳过一些可能导致执行出错的SQL语句
enforce-gtid-consistency=on
# #   
# #   # ------从节点配置-------
# #   # 开启relay-log日志(同样可以指定前缀名)
relay_log=mysql-relay-log
# #   # 开启存储过程、函数、触发器等内容的同步功能
log_bin_trust_function_creators=true
# #   # 同步执行跳过一些错误码(防止同步写入时出现错误导致复制中断)
slave_skip_errors=1062,1032,1053,1236,1050
# #    
# #    # ------自增序列配置-------
# #    # 设置自增初始值为1
auto_increment_offset=2
# #    # 设置自增步长为2,自增序列为{1、3、5、7、9.....}
auto_increment_increment=2

第二个节点的配置信息和第一个节点的信息,只有两点不同:

• server-id=2:这个是各节点在集群中的唯一标识,不能重复。
• auto_increment_offset=2:自增初始值从1变为2,最终效果会变为:

5.3.5 配置账号密码

systemctl start mysqld

grep 'temporary password' /var/log/mysqld.log
 
mysql -uroot -p

ALTER USER 'root'@'localhost' IDENTIFIED BY 'Admin@123';

SHOW VARIABLES LIKE 'validate_password%';

SET GLOBAL validate_password.policy=LOW;

UPDATE mysql.user SET Host='%' WHERE User='root' AND Host='localhost';
FLUSH PRIVILEGES;

ALTER USER 'root'@'%' IDENTIFIED BY 'Admin@123';

5.3.6 配置主从

#登录mysql
mysql -uroot -p

#在78上执行将79设为自己的主节点
change master to master_host='192.168.202.79',
       master_user='root',
       master_password='Admin@123',
       master_port=3306,
       master_auto_position=1;

#启动从

start slave;

#在79上执行将78设为自己的主节点


change master to master_host='192.168.101.78',
       master_user='root',
       master_password='Admin@123',
       master_port=3306,
       master_auto_position=1;

#启动从

start slave;

#查看状态

SHOW SLAVE STATUS\G
 

这俩都是yes就正常

5.3 部署K3s及服务

参考k3s官网

K3s | 轻量级Kubernetes | 物联网及边缘计算K8S解决方案 | Rancher

5.4 部署自己的应用服务

5.5 验证

通过重启keepalived来切换验证。

 

 

 

;