目录
1、配置文件
准备了6个配置文件:redis-6381.conf,redis-6382.conf,redis-6383.conf,redis-6384.conf,redis-6385.conf,
redis-6386.conf。配置文件内容如下:
# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
# 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000
备注:Redis版本为6.0.4。
2、启动服务并创建集群
(1)启动6个Redis服务
redis-server redis-6381.conf
redis-server redis-6382.conf
redis-server redis-6383.conf
redis-server redis-6384.conf
redis-server redis-6385.conf
redis-server redis-6386.conf
(2)通过客户端命令创建集群
创建集群,每个master节点分配一个从节点:
redis-cli --cluster create \
172.17.0.13:6381 172.17.0.13:6382 172.17.0.13:6383 \
172.17.0.13:6384 172.17.0.13:6385 172.17.0.13:6386 \
--cluster-replicas 1
3、客户端连接
(1)客户端配置
@Configuration
public class RedisClusterConfig {
@Bean
public RedisConnectionFactory redisConnectionFactory() {
// 客户端读写分离配置
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.readFrom(ReadFrom.REPLICA_PREFERRED)
.build();
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
"122.51.151.130:6381",
"122.51.151.130:6382",
"122.51.151.130:6383",
"122.51.151.130:6384",
"122.51.151.130:6385",
"122.51.151.130:6386"));
return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
}
}
(2)测试用例
@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {
@Autowired
private StringRedisTemplate stringRedisTemplate;
@Test
public void readFromReplicaWriteToMasterTest() {
System.out.println("开始设置值...");
stringRedisTemplate.opsForValue().set("username", "Nick");
System.out.println("获取值:" + stringRedisTemplate.opsForValue().get("username"));
}
}
(3)错误日志分析
2020-08-14 14:57:49.180 WARN 22012 --- [ioEventLoop-6-4] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6384]: connection timed out: /172.17.0.13:6384
2020-08-14 14:57:49.180 WARN 22012 --- [ioEventLoop-6-3] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6383]: connection timed out: /172.17.0.13:6383
2020-08-14 14:57:49.182 WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6382]: connection timed out: /172.17.0.13:6382
2020-08-14 14:57:49.182 WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6381]: connection timed out: /172.17.0.13:6381
2020-08-14 14:57:49.190 WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6385]: connection timed out: /172.17.0.13:6385
2020-08-14 14:57:49.191 WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh : Unable to connect to [172.17.0.13:6386]: connection timed out: /172.17.0.13:6386
2020-08-14 14:57:59.389 WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6382
2020-08-14 14:58:09.391 WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6381
2020-08-14 14:58:19.393 WARN 22012 --- [ioEventLoop-6-1] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6383
2020-08-14 14:58:29.396 WARN 22012 --- [ioEventLoop-6-2] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6384
2020-08-14 14:58:39.399 WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6386
2020-08-14 14:58:49.402 WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient : connection timed out: /172.17.0.13:6385
连接客户端我们用的是Lettuce,这里发现指定的公网ip竟然变成私网ip了,客户端获取的IP地址信息是从Redis集群获取的,所以我们得让集群返回给我们公网ip。
4、问题解决
(1)查redis.conf配置文件
让Redis暴露公网IP其实在redis.conf配置文件里是能找到的,下面这段配置主要针对docker这种特殊的部署,这里我们也可以手动指定Redis的公网IP、端口以及总线端口(默认服务端口加10000)。
########################## CLUSTER DOCKER/NAT support ########################
# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port
#
# Each instruct the node about its address, client port, and cluster message
# bus port. The information is then published in the header of the bus packets
# so that other nodes will be able to correctly map the address of the node
# publishing the information.
#
# If the above options are not used, the normal Redis Cluster auto-detection
# will be used instead.
#
# Note that when remapped, the bus port may not be at the fixed offset of
# clients port + 10000, so you can specify any port and bus-port depending
# on how they get remapped. If the bus-port is not set, a fixed offset of
# 10000 will be used as usually.
#
# Example:
#
# cluster-announce-ip 10.1.1.5
# cluster-announce-port 6379
# cluster-announce-bus-port 6380
(2)修改配置文件
手动指定了公网ip后,Redis集群中的节点会通过公网IP进行通信,也就是外网访问。因此相关的总线端口,如下面的16381等总线端口必须在云服务器中的安全组中放开,不然集群会处于fail状态。
# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
# 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000
# 云服务器上部署需指定公网ip
cluster-announce-ip 122.51.151.130
# Redis总线端口,用于与其它节点通信
cluster-announce-bus-port 16381
(3)重新启动Redis服务并创建集群
这个时候我们可以查看一下节点配置文件cluster-node-6381.conf的内容前后有啥变化。
未指定公网IP前:
[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf
34287d78c1e9c4ff49880bb976707a0c17676f82 172.17.0.13:6384@16384 slave 1a206270f835a79e43e281df5f6f8215ab49d713 0 1597390563209 4 connected
e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 172.17.0.13:6383@16383 master - 0 1597390565212 3 connected 10923-16383
0932cc203a19f37a3f5ebca8278962f5b325c67e 172.17.0.13:6385@16385 slave 2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 0 1597390564711 5 connected
2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 172.17.0.13:6382@16382 master - 0 1597390565000 2 connected 5461-10922
1a206270f835a79e43e281df5f6f8215ab49d713 172.17.0.13:6381@16381 myself,master - 0 1597390564000 1 connected 0-5460
0f63accb455594d0625cffa8d09aacc580d7e428 172.17.0.13:6386@16386 slave e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 0 1597390564210 6 connected
指定公网IP后:
[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf
e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 122.51.151.130:6384@16384 master - 0 1597389992286 7 connected 10923-16383
511668874d39a7b1f701cc3df6f21d00510bfeae 122.51.151.130:6383@16383 slave e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 0 1597389991283 7 connected
e77e540ef4115abe920fb191f354b81f42e7b4ed 122.51.151.130:6381@16381 myself,master - 0 1597389991000 1 connected 0-5460
2a3ea359311b34cd59e10da7d2f1bba48403f0ee 122.51.151.130:6385@16385 slave e77e540ef4115abe920fb191f354b81f42e7b4ed 0 1597389990583 5 connected
2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 122.51.151.130:6382@16382 master - 0 1597389992789 2 connected 5461-10922
2b7671e002143b329c9c6c969bfb825a86fb41b2 122.51.151.130:6386@16386 slave 2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 0 1597389991784 6 connected
vars currentEpoch 7 lastVoteEpoch 7
这里我们可以发现,各节点暴露的IP全是公网IP了,再次运行测试用例,一切正常。
5、故障转移期间Lettuce客户端连接问题
(1)测试用例
@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {
@Autowired
private StringRedisTemplate stringRedisTemplate;
@Test
public void automaticFailoverTest() throws InterruptedException {
int count = 0;
while (true) {
try {
stringRedisTemplate.opsForValue().set("count", String.valueOf(++count));
System.out.println("修改count的值:" + count);
System.out.println("获取count的值:" + stringRedisTemplate.opsForValue().get("count"));
Thread.sleep(2000);
} catch (Exception e) {
System.out.println("可能发生切主,重新操作...");
Thread.sleep(3000);
}
}
}
}
(2)停掉其中一个master节点,模拟宕机
日志如下:
2020-08-20 19:33:25.118 INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was /122.51.151.130:6384
2020-08-20 19:33:26.213 WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:31.015 INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:32.107 WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:36.616 INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:37.709 WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:42.016 INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:43.110 WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:47.216 INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:48.317 WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:56.515 INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:57.605 WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:34:14.016 INFO 13696 --- [xecutorLoop-1-3] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:15.113 WARN 13696 --- [ioEventLoop-6-3] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...
2020-08-20 19:34:45.116 INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:46.212 WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:35:16.216 INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:35:17.310 WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...
等了很长一段时间发现,发现客户端一致处于重连状态,这Lettuce客户端可能有毒。
(3)解决办法
1)更换Redis客户端
将客户端换为Jedis后,再次模拟主节点宕机,发现过段时间后客户端连接恢复正常了。
@Configuration
public class RedisClusterConfig {
@Bean
public RedisConnectionFactory redisConnectionFactory() {
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
"122.51.151.130:6381",
"122.51.151.130:6382",
"122.51.151.130:6383",
"122.51.151.130:6384",
"122.51.151.130:6385",
"122.51.151.130:6386"));
return new JedisConnectionFactory(redisClusterConfiguration);
}
}
2)Lettuce客户端配置Redis集群拓扑刷新
难道Lettuce客户端不支持主从切换后客户端重连么,那是不可能的。我们在github上找到了关于lettuce关于Redis集群的一些信息,相关地址如下:
https://github.com/lettuce-io/lettuce-core/wiki/Redis-Cluster
https://github.com/lettuce-io/lettuce-core/wiki/Client-options#cluster-specific-options
接下来按照文档上的提示修改客户端配置:
@Configuration
public class RedisClusterConfig {
@Bean
public RedisConnectionFactory redisConnectionFactory() {
// 开启自适应集群拓扑刷新和周期拓扑刷新,不开启相应槽位主节点挂掉会出现服务不可用,直到挂掉节点重新恢复
ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
.enableAllAdaptiveRefreshTriggers() // 开启自适应刷新,自适应刷新不开启,Redis集群变更时将会导致连接异常
.adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自适应刷新超时时间(默认30秒),默认关闭开启后时间为30秒
.enablePeriodicRefresh(Duration.ofSeconds(20)) // 默认关闭开启后时间为60秒 ClusterTopologyRefreshOptions.DEFAULT_REFRESH_PERIOD 60 .enablePeriodicRefresh(Duration.ofSeconds(2)) = .enablePeriodicRefresh().refreshPeriod(Duration.ofSeconds(2))
.build();
ClientOptions clientOptions = ClusterClientOptions.builder()
.topologyRefreshOptions(clusterTopologyRefreshOptions)
.build();
// 客户端读写分离配置
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.clientOptions(clientOptions)
.build();
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
"122.51.151.130:6381",
"122.51.151.130:6382",
"122.51.151.130:6383",
"122.51.151.130:6384",
"122.51.151.130:6385",
"122.51.151.130:6386"));
return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
}
}
修改完配置后,再次运行测试用例,模拟主节点宕机,客户端再次重连。