目录
哨兵的作用和工作原理
Redis 提供了哨兵 (Sentinel) 机制来实现主从集群的自动故障恢复。哨兵的结构和作用如下
-
监控:Sentinel 会不断检查你的 master 和 slave 是否按预期工作
-
自动故障恢复:如果 master 故障, Sentinel 会将一个 Slave 提成为 master ,当故障实例恢复后也还是以新的 master 为主
-
通知:Sentinel 充当 Redis 客户端的服务发现来源,当集群发生故障转移时,会将最新消息推送至 Redis 客户端
服务状态监控
Sentinel 基于心跳机制检测服务状态,每隔 1 秒向集群的每个实例发送ping 命令
-
主管下线:如果某 Sentinel 节点发现某实例未在规定时间内响应,则认为该实例主观下线
-
客观下线:若超过指定数量(quorum)的 sentinel 都认为该实例主观下线,则该实例客观下线。quorum 值最好超过 Sentinel 实例数量的一半
选举新的 master
一旦发现 master 故障,sentinel 需要在 slave 中选择一个作为新的 master :
-
首先会判断 slave 节点与 master 节点断开的时间长短,如果超出指定值 (down-after-milliseconds * 10)则会排除该 slave 节点
-
然后判断 slave 节点的 slave-priority 值,越小优先级越高,如果是 0 则永远不参与选举
-
如果 slave-prority 一样,则判断 slave 节点的 offset 值,越大说明数据越新,优先级越高
-
最后是判断 slave 节点的运行 id 大小,越小优先级越高
如何实现故障转移
当选中了其中一个 slave 为新的 master 后,故障的转移的步骤如下
-
sentinel 给备选的 slave 节点发送 slaveof no one 命令,让该节点成为 master
-
sentinel 给所有其他 slave 发送 slaveof 192.168.142.152 6379 命令,让这些 slave 成为新的 master 的从节点,开始从新的 master 上同步数据
-
最后,sentinel 将故障节点标记为 slave ,当故障节点恢复后会自动成为新的 master 的 slave 节点
搭建哨兵集群
IP | PORT | ROLE |
---|---|---|
192.168.142.157 | 6379 | master |
192.168.142.156 | 6379 | slave |
192.168.142.155 | 6379 | slave02 |
192.168.142.157 | 26379 | sentinel |
192.168.142.156 | 26379 | sentinel |
192.168.142.155 | 26379 | sentinel |
我这里省事,只用了三台服务
master , sentinel
docker-compose.yml
services:
redis-master:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-master
privileged: true
ports:
- '6379:6379'
volumes:
- redis-data:/opt/bitnami/redis/data
- /root/redis.conf:/etc/redis.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/redis.conf
redis-sentinel:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-sentinel
privileged: true
ports:
- '26379:26379'
volumes:
- /root/sentinel.conf:/etc/sentinel.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/sentinel.conf --sentinel
volumes:
redis-data:
redis.conf
daemonize no
port 6379
protected-mode no
bind 0.0.0.0
requirepass 123456
sentinel.conf
port 26379
protected-mode no
sentinel monitor mymaster 192.168.142.157 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 3000
slave , sentinel
docker-compose.yml
services:
redis-slave:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-slave
privileged: true
ports:
- '6379:6379'
volumes:
- redis-data:/opt/bitnami/redis/data
- /root/redis.conf:/etc/redis.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/redis.conf
redis-sentinel:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-sentinel
privileged: true
ports:
- '26379:26379'
volumes:
- /root/sentinel.conf:/etc/sentinel.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/sentinel.conf --sentinel
volumes:
redis-data:
redis.conf
daemonize no
port 6379
protected-mode no
masterauth 123456
requirepass 123456
slave-read-only yes
bind 0.0.0.0
slaveof 192.168.142.157 6379
sentinel.conf
port 26379
protected-mode no
sentinel monitor mymaster 192.168.142.157 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 3000
slave02 , sentinel
docker-compose.yml
services:
redis-slave02:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-slave02
privileged: true
ports:
- '6379:6379'
volumes:
- redis-data:/opt/bitnami/redis/data
- /root/redis.conf:/etc/redis.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/redis.conf
redis-sentinel:
image: hub.atomgit.com/amd64/redis:7.0.13
restart: always
container_name: redis-sentinel
privileged: true
ports:
- '26379:26379'
volumes:
- /root/sentinel.conf:/etc/sentinel.conf
- /etc/localtime:/etc/localtime:ro
command:
- /bin/sh
- -c
- redis-server /etc/sentinel.conf --sentinel
volumes:
redis-data:
redis.conf
daemonize no
port 6379
protected-mode no
masterauth 123456
requirepass 123456
slave-read-only yes
bind 0.0.0.0
slaveof 192.168.142.157 6379
sentinel.conf
port 26379
protected-mode no
sentinel monitor mymaster 192.168.142.157 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 3000
哨兵就盯着 master 看,一旦 master g了就立刻预警,就可以开始启动 docker 了
启动 docker
在三台主机上分别执行
docker compose up -d
查看状态 Up 表示成功启动
root@master:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
aa6466e6fa15 hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" 27 minutes ago Up 21 minutes 6379/tcp, 0.0.0.0:26379->26379/tcp, :::26379->26379/tcp redis-sentinel
27a2f19d8040 hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" 27 minutes ago Up 21 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp redis-master
root@slave:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d3df2c507f5 hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" 25 minutes ago Up 21 minutes 6379/tcp, 0.0.0.0:26379->26379/tcp, :::26379->26379/tcp redis-sentinel
b9b981917f2d hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" About an hour ago Up 21 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp redis-slave
root@slave02:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
774fb813bbf2 hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" 23 minutes ago Up 20 minutes 6379/tcp, 0.0.0.0:26379->26379/tcp, :::26379->26379/tcp redis-sentinel
02a276c8edc8 hub.atomgit.com/amd64/redis:7.0.13 "docker-entrypoint.s…" About an hour ago Up 20 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp redis-slave02
启动成功就可以去查看 sentinel 的状态了
root@master:~# docker exec redis-sentinel redis-cli -p 26379 -c info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.142.157:6379,slaves=2,sentinels=1
root@slave:~# docker exec -it redis-sentinel redis-cli -p 26379 -c info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.142.157:6379,slaves=0,sentinels=1
root@slave02:~# docker exec -it redis-sentinel redis-cli -p 26379 -c info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.142.157:6379,slaves=0,sentinels=1
出现本机 ip 和 ok 状态就代表哨兵启动成功
测试
假设 master 宕机
root@master:~# docker stop redis-master
查看 slave
root@slave:~# docker exec redis-slave redis-cli -a 123456 -c role
slave
192.168.142.157
6379
connect
-1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
查看 slave02
root@slave02:~# docker exec -it redis-slave02 redis-cli -a 123456 -c role
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1) "master"
2) (integer) 19487
3) 1) 1) "192.168.142.156"
2) "6379"
3) "19347"
到此,哨兵搭建完成
补充
关于 sentinel.conf 讲解
-
sentinel monitor mymaster 192.168.142.157 6379 2 指定主节点信息
-
mymaster 主节点名称,自定义
-
192.168.142.157 6379 主节点 IP 和端口
-
2 选举 master 时的 quorum 值
-
-
sentinel down-after-milliseconds mymaster 5000 salve 与 master 断开的超时时间
-
sentinel failover-timeout mymaster 60000 故障恢复的超时时间
-
sentinel auth-pass mymaster 123456 主节点密码
-
sentinel parallel-syncs mymaster 1
这条指令告诉 Sentinel,对于名为mymaster
的 Redis 主节点,在进行故障转移时,只允许一个从节点同时对新的主节点进行数据同步。这意味着在故障转移过程中,只有一个从节点会开始与新的主节点同步数据,其他从节点会等待,直到该从节点完成同步后才开始。设置
parallel-syncs
为 1 可以确保在故障转移期间,只有一个从节点在任何给定时间与新的主节点进行数据同步。这样做的好处是可以减少对新主节点的负载,避免在故障转移期间对新主节点造成过大的压力,从而影响其性能。然而,这也意味着故障转移过程可能会花费更长的时间,因为从节点需要一个接一个地进行数据同步。