Bootstrap

云服务器Redis集群部署及客户端通过公网IP连接问题

1、配置文件

准备了6个配置文件:redis-6381.conf,redis-6382.conf,redis-6383.conf,redis-6384.conf,redis-6385.conf,
redis-6386.conf。配置文件内容如下:

# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes 
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
 # 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000

备注:Redis版本为6.0.4

2、启动服务并创建集群

(1)启动6个Redis服务

redis-server redis-6381.conf
redis-server redis-6382.conf
redis-server redis-6383.conf
redis-server redis-6384.conf
redis-server redis-6385.conf
redis-server redis-6386.conf

(2)通过客户端命令创建集群

创建集群,每个master节点分配一个从节点:

redis-cli --cluster create \
172.17.0.13:6381 172.17.0.13:6382 172.17.0.13:6383 \
172.17.0.13:6384 172.17.0.13:6385 172.17.0.13:6386 \
--cluster-replicas 1

3、客户端连接

(1)客户端配置

@Configuration
public class RedisClusterConfig {

	@Bean
	public RedisConnectionFactory redisConnectionFactory() {
		// 客户端读写分离配置
		LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
						.readFrom(ReadFrom.REPLICA_PREFERRED)
						.build();
		RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
						"122.51.151.130:6381",
						"122.51.151.130:6382",
						"122.51.151.130:6383",
						"122.51.151.130:6384",
						"122.51.151.130:6385",
						"122.51.151.130:6386"));
		return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
	}
}

(2)测试用例

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {

	@Autowired
	private StringRedisTemplate stringRedisTemplate;

	@Test
	public void readFromReplicaWriteToMasterTest() {
		System.out.println("开始设置值...");
		stringRedisTemplate.opsForValue().set("username", "Nick");
		System.out.println("获取值:" + stringRedisTemplate.opsForValue().get("username"));
	}
}

(3)错误日志分析

2020-08-14 14:57:49.180  WARN 22012 --- [ioEventLoop-6-4] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6384]: connection timed out: /172.17.0.13:6384
2020-08-14 14:57:49.180  WARN 22012 --- [ioEventLoop-6-3] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6383]: connection timed out: /172.17.0.13:6383
2020-08-14 14:57:49.182  WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6382]: connection timed out: /172.17.0.13:6382
2020-08-14 14:57:49.182  WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6381]: connection timed out: /172.17.0.13:6381
2020-08-14 14:57:49.190  WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6385]: connection timed out: /172.17.0.13:6385
2020-08-14 14:57:49.191  WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6386]: connection timed out: /172.17.0.13:6386
2020-08-14 14:57:59.389  WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6382
2020-08-14 14:58:09.391  WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6381
2020-08-14 14:58:19.393  WARN 22012 --- [ioEventLoop-6-1] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6383
2020-08-14 14:58:29.396  WARN 22012 --- [ioEventLoop-6-2] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6384
2020-08-14 14:58:39.399  WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6386
2020-08-14 14:58:49.402  WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6385

连接客户端我们用的是Lettuce,这里发现指定的公网ip竟然变成私网ip了,客户端获取的IP地址信息是从Redis集群获取的,所以我们得让集群返回给我们公网ip。

4、问题解决

(1)查redis.conf配置文件

让Redis暴露公网IP其实在redis.conf配置文件里是能找到的,下面这段配置主要针对docker这种特殊的部署,这里我们也可以手动指定Redis的公网IP、端口以及总线端口(默认服务端口加10000)。

########################## CLUSTER DOCKER/NAT support  ########################

# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port
#
# Each instruct the node about its address, client port, and cluster message
# bus port. The information is then published in the header of the bus packets
# so that other nodes will be able to correctly map the address of the node
# publishing the information.
#
# If the above options are not used, the normal Redis Cluster auto-detection
# will be used instead.
#
# Note that when remapped, the bus port may not be at the fixed offset of
# clients port + 10000, so you can specify any port and bus-port depending
# on how they get remapped. If the bus-port is not set, a fixed offset of
# 10000 will be used as usually.
#
# Example:
#
# cluster-announce-ip 10.1.1.5
# cluster-announce-port 6379
# cluster-announce-bus-port 6380

(2)修改配置文件

手动指定了公网ip后,Redis集群中的节点会通过公网IP进行通信,也就是外网访问。因此相关的总线端口,如下面的16381等总线端口必须在云服务器中的安全组中放开,不然集群会处于fail状态。

# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes 
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
 # 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000

# 云服务器上部署需指定公网ip
cluster-announce-ip 122.51.151.130
# Redis总线端口,用于与其它节点通信
cluster-announce-bus-port 16381

(3)重新启动Redis服务并创建集群

这个时候我们可以查看一下节点配置文件cluster-node-6381.conf的内容前后有啥变化。

未指定公网IP前:

[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf 
34287d78c1e9c4ff49880bb976707a0c17676f82 172.17.0.13:6384@16384 slave 1a206270f835a79e43e281df5f6f8215ab49d713 0 1597390563209 4 connected
e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 172.17.0.13:6383@16383 master - 0 1597390565212 3 connected 10923-16383
0932cc203a19f37a3f5ebca8278962f5b325c67e 172.17.0.13:6385@16385 slave 2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 0 1597390564711 5 connected
2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 172.17.0.13:6382@16382 master - 0 1597390565000 2 connected 5461-10922
1a206270f835a79e43e281df5f6f8215ab49d713 172.17.0.13:6381@16381 myself,master - 0 1597390564000 1 connected 0-5460
0f63accb455594d0625cffa8d09aacc580d7e428 172.17.0.13:6386@16386 slave e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 0 1597390564210 6 connected

指定公网IP后:

[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf 
e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 122.51.151.130:6384@16384 master - 0 1597389992286 7 connected 10923-16383
511668874d39a7b1f701cc3df6f21d00510bfeae 122.51.151.130:6383@16383 slave e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 0 1597389991283 7 connected
e77e540ef4115abe920fb191f354b81f42e7b4ed 122.51.151.130:6381@16381 myself,master - 0 1597389991000 1 connected 0-5460
2a3ea359311b34cd59e10da7d2f1bba48403f0ee 122.51.151.130:6385@16385 slave e77e540ef4115abe920fb191f354b81f42e7b4ed 0 1597389990583 5 connected
2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 122.51.151.130:6382@16382 master - 0 1597389992789 2 connected 5461-10922
2b7671e002143b329c9c6c969bfb825a86fb41b2 122.51.151.130:6386@16386 slave 2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 0 1597389991784 6 connected
vars currentEpoch 7 lastVoteEpoch 7

这里我们可以发现,各节点暴露的IP全是公网IP了,再次运行测试用例,一切正常。

5、故障转移期间Lettuce客户端连接问题

(1)测试用例

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {

	@Autowired
	private StringRedisTemplate stringRedisTemplate;

	@Test
	public void automaticFailoverTest() throws InterruptedException {
		int count = 0;
		while (true) {
			try {
				stringRedisTemplate.opsForValue().set("count", String.valueOf(++count));
				System.out.println("修改count的值:" + count);
				System.out.println("获取count的值:" + stringRedisTemplate.opsForValue().get("count"));
				Thread.sleep(2000);
			} catch (Exception e) {
				System.out.println("可能发生切主,重新操作...");
				Thread.sleep(3000);
			}
		}
	}
}

(2)停掉其中一个master节点,模拟宕机

日志如下:

2020-08-20 19:33:25.118  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was /122.51.151.130:6384
2020-08-20 19:33:26.213  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:31.015  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:32.107  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:36.616  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:37.709  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:42.016  INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:43.110  WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:47.216  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:48.317  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:56.515  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:57.605  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:34:14.016  INFO 13696 --- [xecutorLoop-1-3] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:15.113  WARN 13696 --- [ioEventLoop-6-3] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...
2020-08-20 19:34:45.116  INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:46.212  WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:35:16.216  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:35:17.310  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...

等了很长一段时间发现,发现客户端一致处于重连状态,这Lettuce客户端可能有毒。

(3)解决办法

1)更换Redis客户端

将客户端换为Jedis后,再次模拟主节点宕机,发现过段时间后客户端连接恢复正常了。

@Configuration
public class RedisClusterConfig {

	@Bean
	public RedisConnectionFactory redisConnectionFactory() {
		RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
						"122.51.151.130:6381",
						"122.51.151.130:6382",
						"122.51.151.130:6383",
						"122.51.151.130:6384",
						"122.51.151.130:6385",
						"122.51.151.130:6386"));
		return new JedisConnectionFactory(redisClusterConfiguration);
	}
}

2)Lettuce客户端配置Redis集群拓扑刷新

难道Lettuce客户端不支持主从切换后客户端重连么,那是不可能的。我们在github上找到了关于lettuce关于Redis集群的一些信息,相关地址如下:
https://github.com/lettuce-io/lettuce-core/wiki/Redis-Cluster
https://github.com/lettuce-io/lettuce-core/wiki/Client-options#cluster-specific-options

接下来按照文档上的提示修改客户端配置:

@Configuration
public class RedisClusterConfig {

	@Bean
	public RedisConnectionFactory redisConnectionFactory() {
		// 开启自适应集群拓扑刷新和周期拓扑刷新,不开启相应槽位主节点挂掉会出现服务不可用,直到挂掉节点重新恢复
		ClusterTopologyRefreshOptions clusterTopologyRefreshOptions =  ClusterTopologyRefreshOptions.builder()
						.enableAllAdaptiveRefreshTriggers() // 开启自适应刷新,自适应刷新不开启,Redis集群变更时将会导致连接异常
						.adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自适应刷新超时时间(默认30秒),默认关闭开启后时间为30秒
						.enablePeriodicRefresh(Duration.ofSeconds(20))  // 默认关闭开启后时间为60秒 ClusterTopologyRefreshOptions.DEFAULT_REFRESH_PERIOD 60  .enablePeriodicRefresh(Duration.ofSeconds(2)) = .enablePeriodicRefresh().refreshPeriod(Duration.ofSeconds(2))
						.build();
		ClientOptions clientOptions = ClusterClientOptions.builder()
						.topologyRefreshOptions(clusterTopologyRefreshOptions)
						.build();
		// 客户端读写分离配置
		LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
						.clientOptions(clientOptions)
						.build();
		RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
						"122.51.151.130:6381",
						"122.51.151.130:6382",
						"122.51.151.130:6383",
						"122.51.151.130:6384",
						"122.51.151.130:6385",
						"122.51.151.130:6386"));
		return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
	}
}

修改完配置后,再次运行测试用例,模拟主节点宕机,客户端再次重连。

在这里插入图片描述

悦读

道可道,非常道;名可名,非常名。 无名,天地之始,有名,万物之母。 故常无欲,以观其妙,常有欲,以观其徼。 此两者,同出而异名,同谓之玄,玄之又玄,众妙之门。

;