一、应用场景
由于公司有定时任务服务,为了保证定时任务的正常执行 对定时任务做了高可用操作(多台部署)可以达到任何一台定时任务服务器挂掉后,定时同步数据的任务不会受到任何影响,但是次方法会造成的问题就是:多台服务器如果都同步一次数据 数据会无辜变多,总而言之并不是原子性操作 由于场景为多个服务器,多个应用,而并非多线程多并发,因此需要采用中间件来实现
能解决此问题的中间件有:数据库、Redis、zookeeper ;此次使用Redis实现
二、代码实现
由于是定时任务服务,所以有多个定时任务 ,因此我们对@Scheduled 注解做了切面,这样使用此注解的任务都会经过此切面作为统一的判断入库
@Aspect
@Component
@RequiredArgsConstructor
@Slf4j
public class TaskLogAspect {
private final LogTaskServiceRunInfoWriteMapper logMapper;
private final Environment environment;
private final EshipCacheService eshipCacheService;
@Pointcut("@annotation(org.springframework.scheduling.annotation.Scheduled)")
public void logPointCut() {
}
@Around("logPointCut()")
public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
//切面
return result;
}
}
然后我们需要在任务进入到此方法时统一拦截,使用任务名称作为key,服务器ip作为value 存入 redis并且设置过期时间,如果可以存入成功则抢锁成功,否则判断此key中存在的ip是否为此ip,如果是则续命过期时间(防止任务执行时间长,导致一个定时周期没执行完而跳过下个任务)否则抢锁失败
public boolean requireLock(String key, String ip, long expire) {
if (cacheService.setIfAbsent(this.scopeName, key, ip, Duration.ofSeconds(expire))) {
return false;
} else {
try {
String value = getValue(key);
if (Objects.nonNull(value) && StringUtils.equals(objectMapper.readValue(value, String.class), ip)) {
cacheService.expire(this.scopeName, key, Duration.ofSeconds(expire));
return false;
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
return true;
}
return true;
}
}
如果抢锁失败则直接返回 如果抢锁成功则执行任务,此处有两点:
1. 需要创建一个守护线程去不停的续命
/**
* @author: jianpli
* @date: 2022/08/31/10:07 AM
* @description: 分布式锁守护线程
*/
@Slf4j
public class SurvivalClamProcessor implements Runnable {
private final EshipCacheService eshipCacheService;
SurvivalClamProcessor(String key, String value, int lockTime, EshipCacheService eshipCacheService) {
this.key = key;
this.value = value;
this.lockTime = lockTime;
this.signal = Boolean.TRUE;
this.eshipCacheService = eshipCacheService;
}
private String key;
private String value;
private int lockTime;
/**
* 线程关闭的标记
*/
private volatile Boolean signal;
void stop() {
this.signal = Boolean.FALSE;
}
@Override
public void run() {
/**
* 负载因子 0.7 -> 2/3
*/
int waitTime = lockTime * 1000 * 2 / 3;
while (signal) {
try {
Thread.sleep(waitTime);
if (signal) {
if (eshipCacheService.expire(key, value, Duration.ofSeconds(lockTime))) {
log.info("expandLockTime success,wait time {}ms,Reset lock timeout {}s,key {}", waitTime, lockTime, key);
} else {
log.error("expandLockTime fail,cause SurvivalClamConsumer interrupted,Please check redis!");
this.stop();
}
}
} catch (InterruptedException e) {
// 会手动调用 interrupt 方法 日志级别为:INFO
log.info("SurvivalClamProcessor:Processing thread was forcibly interrupted!");
break;
} catch (Exception e) {
log.error("SurvivalClamProcessor run error:{}", ExceptionUtils.getStackTrace(e));
}
}
log.info("SurvivalClamProcessor:Processing thread stopped!");
}
}
/**
*Redis续命操作
*/
public <T> boolean expire(String key, T value, Duration ofSeconds) {
try {
String valueStr = getValue(key);
if (Objects.nonNull(valueStr) && StringUtils.equals(objectMapper.readValue(valueStr, String.class), String.valueOf(value))) {
return cacheService.expire(this.scopeName, key, ofSeconds);
}
} catch (JsonProcessingException e) {
logger.error(e.getMessage(), e);
}
return false;
}
2.由于给ip续过期时间并不是原子操作 所以借鉴了dubbo check原理进行二次校验
if (eshipCacheService.requireLock(taskName, ip, REDIS_EXPIRE)) {
isRun = false;
log.info("{}机器上占用分布式锁,{}任务正在执行,taskId:{}", eshipCacheService.getValue(taskName), taskName, ContextHandler.getTaskId());
} else {
if (ip.equals(eshipCacheService.getValue(taskName,String.class))) {
survivalClamProcessor = new SurvivalClamProcessor(taskName, ip, REDIS_EXPIRE, eshipCacheService);
survivalThread = new Thread(survivalClamProcessor);
survivalThread.setDaemon(Boolean.TRUE);
executorService.execute(survivalThread);
// 以下信息 用于日志记录
log.info("{} start,taskId:{} ", taskName, ContextHandler.getTaskId());
// 实际方法的执行
result = joinPoint.proceed();
log.info("{} end,taskId:{} ", taskName, ContextHandler.getTaskId());
}else {
isRun = false;
}
}
3. 如果抢锁成功则执行对应任务并记录执行日志(用于以后的执行验证及任务回溯),执行结束后删除锁(此处一定是放在finally中执行)
public void isOwnDelete(String key, String ip) {
try {
String value = getValue(key);
if (Objects.nonNull(value) && StringUtils.equals(objectMapper.readValue(value, String.class), ip)) {
delete(key);
}
} catch (JsonProcessingException e) {
logger.error(e.getMessage(), e);
}
}
4. 中断守护线程
if (Objects.nonNull(survivalClamProcessor)) {
survivalClamProcessor.stop();
}
if (Objects.nonNull(survivalThread)) {
survivalThread.interrupt();
}
到此时Redis分布式锁已经实现,由于公司使用读写分离设计,第一次连接较慢,所以在程序启动时需要初始化一下连接,并在服务停止后清除Redis中此服务所占有的锁
/**
* 在服务停止后及时释放锁
*/
@PreDestroy
private void preDestroy() {
log.info("##############################<release lock>######################### ");
String applicationName = environment.getProperty("spring.application.name", "ESHIP-MAIN-TASK");
List<String> list = logMapper.selectTaskNameByApplicationName(applicationName);
String ip = IpUtils.getLocalIp();
for (String key : list) {
eshipCacheService.isOwnDelete(key, ip);
}
executorService.shutdown();
}
@PostConstruct
private void postConstruct() {
log.info("##############################<application init>######################### ");
String applicationName = environment.getProperty("spring.application.name", "ESHIP-MAIN-TASK");
eshipCacheService.requireLock(applicationName, ERROR_IP, REDIS_EXPIRE);
eshipCacheService.delete(applicationName);
}
最后发现一处遗留问题,在程序正常停止时才会执行preDestroy方法,如果使用 -9 强制杀掉进程则此方法并不会执行,因此又提供了一个手动清除锁的接口,供系统管理员使用
@PostMapping("/api/task/release/lock/{ip}")
public ApiResponse<String> releaseLock(@PathVariable String ip) {
boolean exist = true;
try {
if (ip.equals(IpUtils.getLocalIp())) {
throw new UserFriendlyException("无法删除本机key!");
}
Socket socket = new Socket();
socket.connect(new InetSocketAddress(ip, Integer.parseInt(environment.getProperty("server.port", "46105"))));
} catch (IOException e) {
exist = false;
}
if (exist) {
throw new UserFriendlyException("应用存活,无法删除key!");
}
List<String> list = logMapper.selectTaskNameByApplicationName(environment.getProperty("spring.application.name", "ESHIP-MAIN-TASK"));
for (String key : list) {
eshipCacheService.isOwnDelete(key, ip);
}
return new ApiResponse<>("释放锁成功");
}
完整代码如下:
@Aspect
@Component
@RequiredArgsConstructor
@Slf4j
public class TaskLogAspect {
private final LogTaskServiceRunInfoWriteMapper logMapper;
private final Environment environment;
private final EshipCacheService eshipCacheService;
/**
* log_task_service_run_info.error_msg字段的长度。
* 在记录日志时,对其长度进行处理,防止出现 Cause: com.microsoft.sqlserver.jdbc.SQLServerException: String or binary data would be truncated.
*/
private final static int ERROR_MSG_LENGTH = 100;
private final static String ERROR_IP = "UnknownHost";
private final static int REDIS_EXPIRE = 60;
private ThreadFactory namedThreadFactory = new ThreadFactoryBuilder().setNameFormat("redLock-thead-%d").build();
private ExecutorService executorService = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors(), 30,
1L, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(),
namedThreadFactory,
new ThreadPoolExecutor.CallerRunsPolicy());
@Pointcut("@annotation(org.springframework.scheduling.annotation.Scheduled)")
public void logPointCut() {
}
@Around("logPointCut()")
public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
boolean isRun = true;
// 待执行的方法的返回结果
Object result = null;
String errorMsg = null;
Date startTime = new Date();
String taskName = getTaskName(joinPoint);
String ip = IpUtils.getLocalIp();
if (StringUtils.equals(ERROR_IP, ip)) {
return null;
}
SurvivalClamProcessor survivalClamProcessor = null;
Thread survivalThread = null;
ContextHandler.setTaskId(UUID.randomUUID().toString());
try {
/**
* 判断是否有锁 有则不执行 没有则上锁并执行任务<1.项目停止时释放锁(被动释放) 2.过期时间释放锁(被动释放) 3.任务结束释放锁(主动释放)>
* a.已设置为可重入锁 同一台机器的同一个任务可以在加锁的基础上重复执行并延期锁的持有时间
* b.锁持有时长会启动守护线程去续期
* c.目前任务时长不清楚 默认设置为 1 分钟
* d.可考虑数据库锁
*/
if (eshipCacheService.requireLock(taskName, ip, REDIS_EXPIRE)) {
isRun = false;
log.info("{}机器上占用分布式锁,{}任务正在执行,taskId:{}", eshipCacheService.getValue(taskName), taskName, ContextHandler.getTaskId());
} else {
Thread.sleep(1500); //避免服务器时间差以及redis网络波动无法保证原子性
if (ip.equals(eshipCacheService.getValue(taskName,String.class))) {
survivalClamProcessor = new SurvivalClamProcessor(taskName, ip, REDIS_EXPIRE, eshipCacheService);
survivalThread = new Thread(survivalClamProcessor);
survivalThread.setDaemon(Boolean.TRUE);
executorService.execute(survivalThread);
// 以下信息 用于日志记录
log.info("{} start,taskId:{} ", taskName, ContextHandler.getTaskId());
// 实际方法的执行
result = joinPoint.proceed();
log.info("{} end,taskId:{} ", taskName, ContextHandler.getTaskId());
}else {
isRun = false;
}
}
} catch (Exception e) {
if (e instanceof InterruptedException) {
log.error("<ignore> Thread.sleep() interrupted,error:{}", ExceptionUtils.getStackTrace(e));
return null;
}
errorMsg = e.getMessage();
throw e;
} finally {
eshipCacheService.isOwnDelete(taskName, ip);
if (isRun) {
saveLogToDb(taskName, startTime, new Date(), errorMsg);
if (Objects.nonNull(survivalClamProcessor)) {
survivalClamProcessor.stop();
}
if (Objects.nonNull(survivalThread)) {
survivalThread.interrupt();
}
}
ContextHandler.remove();
}
return result;
}
/**
* className#methodName
*/
private String getTaskName(ProceedingJoinPoint joinPoint) {
String applicationName = environment.getProperty("spring.application.name", "");
//请求的 类名、方法名
String className = joinPoint.getTarget().getClass().getSimpleName();
// 获取方法的关键信息,类,包
MethodSignature signature = (MethodSignature) joinPoint.getSignature();
String methodName = signature.getName();
return applicationName + ":" + className + "#" + methodName;
}
/**
* 实现保存日志逻辑
*
* @param taskName
* @param startTime
* @param endTime
* @param errorMsg
*/
private void saveLogToDb(String taskName, Date startTime, Date endTime, String errorMsg) {
try {
LogTaskServiceRunInfo log = new LogTaskServiceRunInfo();
log.setTaskName(taskName);
log.setTaskId(ContextHandler.getTaskId());
log.setStartTime(startTime);
log.setEndTime(endTime);
log.setDuration(endTime.getTime() - startTime.getTime());
log.setRunStatus(StringUtils.isEmpty(errorMsg) ? 1 : 0);
log.setErrorMsg(StringUtils.left(errorMsg, ERROR_MSG_LENGTH));
log.setFromIp(IpUtils.getLocalIp());
logMapper.insert(log);
} catch (Exception e) {
log.error("记录定时任务执行日志时,发生异常 ", e);
}
}
/**
* 在服务停止后及时释放锁
*/
@PreDestroy
private void preDestroy() {
log.info("##############################<release lock>######################### ");
String applicationName = environment.getProperty("spring.application.name", "ESHIP-MAIN-TASK");
List<String> list = logMapper.selectTaskNameByApplicationName(applicationName);
String ip = IpUtils.getLocalIp();
for (String key : list) {
eshipCacheService.isOwnDelete(key, ip);
}
executorService.shutdown();
}
@PostConstruct
private void postConstruct() {
log.info("##############################<application init>######################### ");
String applicationName = environment.getProperty("spring.application.name", "ESHIP-MAIN-TASK");
eshipCacheService.requireLock(applicationName, ERROR_IP, REDIS_EXPIRE);
eshipCacheService.delete(applicationName);
}
}