什么是redis sentinel
参考文档:https://redis.io/topics/sentinel
简单的来说,就是Redis Sentinel 为redis 提供高可用性,主要体现在下面几个方面:
1.监控:redis sentinel会不间断的监控主服务器和从服务器是否正常工作2.通知:当出现问题时,sentinel可以通过API通知系统管理员以及另外的服务器3.自动故障转移:如果主服务器出现故障,sentinel可以启动故障转移,将其中一台从服务器升级为主服务器,其他的从服务器会重新配置为新主服务器 4.提供配置:sentinel充当客户端发现权限来源,客户端连接到sentinel询问负责给定服务器当前redis主服务器地址,如果发生故障,sentinel将报告新地址redis sentinel 模拟环境
模拟环境为:1主2从
========redis=================sentinel==========
master:127.0.0.1 6379 127.0.0.1 26379slave1:127.0.0.1 6380 127.0.0.1 26380slave2:127.0.0.1 6381 127.0.0.1 26381
环境搭建
redis.conf配置
6379
# cat redis-6379.conf | grep -Ev "^$|^#"bind 127.0.0.1port 6379daemonize yespidfile /var/run/redis_6379.pidlogfile "/root/redis/redis-6379.log"dbfilename dump-6379.rdbdir /root/redis...#
6380
# cat redis-6380.conf | grep -Ev "^$|^#"bind 127.0.0.1port 6380daemonize yespidfile /var/run/redis_6380.pidlogfile "/root/redis/redis-6380.log"dbfilename dump-6380.rdbdir /root/redis...#
6381
# cat redis-6381.conf | grep -Ev "^$|^#"bind 127.0.0.1port 6381daemonize yespidfile /var/run/redis_6381.pidlogfile "/root/redis/redis-6381.log"dbfilename dump-6381.rdbdir /root/redis...#
sentinel.conf配置
6379/6380/6381
# cat sentinel-*.conf | grep -Ev "^#|^$"port 26379daemonize yeslogfile "/root/redis/sentinel-6379.log"dir "/tmp"sentinel monitor mymaster 127.0.0.1 6379 2sentinel down-after-milliseconds mymaster 30000sentinel parallel-syncs mymaster 1sentinel failover-timeout mymaster 180000#
启动redis server 和 sentinel
redis:# redis-server /etc/redis_6379.conf# redis-server /etc/redis_6380.conf# redis-server /etc/redis_6381.confsentinel:# redis-sentinel /etc/sentinel-6379.conf# redis-sentinel /etc/sentinel-6380.conf# redis-sentinel /etc/sentinel-6381.conf
配置主从复制
# redis-cli -p 6380127.0.0.1:6380> SLAVEOF 127.0.0.1 6379OK127.0.0.1:6380> exit# redis-cli -p 6381127.0.0.1:6381> SLAVEOF 127.0.0.1 6379OK 127.0.0.1:6381> exit
模拟故障迁移
首先,kill 掉redis master进程
# for n in `ps aux | grep redis-server | grep 6379 | awk '{print $2}'`;do kill -9 $n ;done;
分析log
首先,redis 从服务器首先发现redis master 服务器无法连接,报错如下:
# tail -F redis-63*.log==> redis-6380.log <==2851:S 13 Nov 14:48:54.235 # Connection with master lost.2851:S 13 Nov 14:48:54.235 * Caching the disconnected master state.==> redis-6381.log <==3695:S 13 Nov 14:48:54.466 * Connecting to MASTER 127.0.0.1:63793695:S 13 Nov 14:48:54.466 * MASTER <-> SLAVE sync started3695:S 13 Nov 14:48:54.467 # Error condition on socket for SYNC: Connection refused==> redis-6380.log <==2851:S 13 Nov 14:48:54.781 * Connecting to MASTER 127.0.0.1:63792851:S 13 Nov 14:48:54.782 * MASTER <-> SLAVE sync started2851:S 13 Nov 14:48:54.782 # Error condition on socket for SYNC: Connection refused...
紧接着,redis sentinel 完成故障切换,从log来看,当6379主节点挂了之后,redis重新提了一个从节点6380为主节点,log 如下:
# tail -F sentinel-63*.log==> sentinel-6379.log <==3225:X 13 Nov 14:49:24.322 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.327 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6380.log <==3230:X 13 Nov 14:49:24.332 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.386 # +odown master mymaster 127.0.0.1 6379 #quorum 2/23235:X 13 Nov 14:49:24.386 # +new-epoch 13235:X 13 Nov 14:49:24.386 # +try-failover master mymaster 127.0.0.1 6379==> sentinel-6380.log <==3230:X 13 Nov 14:49:24.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/23230:X 13 Nov 14:49:24.388 # +new-epoch 13230:X 13 Nov 14:49:24.388 # +try-failover master mymaster 127.0.0.1 6379==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.409 # +vote-for-leader 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6380.log <==3230:X 13 Nov 14:49:24.416 # +vote-for-leader 858e250193e7f985bd7d63569a158f52a9cb9e0c 1==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.416 # 858e250193e7f985bd7d63569a158f52a9cb9e0c voted for 858e250193e7f985bd7d63569a158f52a9cb9e0c 1==> sentinel-6380.log <==3230:X 13 Nov 14:49:24.417 # 06f94705a99df53e468af594737913ce7c6287d5 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6379.log <==3225:X 13 Nov 14:49:24.422 # +new-epoch 13225:X 13 Nov 14:49:24.432 # +vote-for-leader 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.432 # d0e6638165ba8f8186562da586f4e0789dd4abd1 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6380.log <==3230:X 13 Nov 14:49:24.432 # d0e6638165ba8f8186562da586f4e0789dd4abd1 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6381.log <==3235:X 13 Nov 14:49:24.468 # +elected-leader master mymaster 127.0.0.1 63793235:X 13 Nov 14:49:24.468 # +failover-state-select-slave master mymaster 127.0.0.1 63793235:X 13 Nov 14:49:24.545 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:24.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:24.608 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:25.295 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:25.295 # +failover-state-reconf-slaves master mymaster 127.0.0.1 63793235:X 13 Nov 14:49:25.345 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379==> sentinel-6379.log <==3225:X 13 Nov 14:49:25.345 # +config-update-from sentinel 06f94705a99df53e468af594737913ce7c6287d5 127.0.0.1 26381 @ mymaster 127.0.0.1 63793225:X 13 Nov 14:49:25.345 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 63803225:X 13 Nov 14:49:25.345 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 63803225:X 13 Nov 14:49:25.345 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6380.log <==3230:X 13 Nov 14:49:25.346 # +config-update-from sentinel 06f94705a99df53e468af594737913ce7c6287d5 127.0.0.1 26381 @ mymaster 127.0.0.1 63793230:X 13 Nov 14:49:25.346 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 63803230:X 13 Nov 14:49:25.346 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 63803230:X 13 Nov 14:49:25.346 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6381.log <==3235:X 13 Nov 14:49:25.561 # -odown master mymaster 127.0.0.1 63793235:X 13 Nov 14:49:25.814 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:26.893 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 63793235:X 13 Nov 14:49:26.954 # +failover-end master mymaster 127.0.0.1 63793235:X 13 Nov 14:49:26.954 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 63803235:X 13 Nov 14:49:26.955 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 63803235:X 13 Nov 14:49:26.955 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6379.log <==3225:X 13 Nov 14:49:55.349 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6380.log <==3230:X 13 Nov 14:49:55.397 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6381.log <==3235:X 13 Nov 14:49:57.014 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
再返回过来看redis server的log,此时可以看到6381为从节点已经向主节点6380请求并且完成了复制操作
==> redis-6380.log <==2851:M 13 Nov 14:49:25.823 * Slave 127.0.0.1:6381 asks for synchronization2851:M 13 Nov 14:49:25.823 * Partial resynchronization request from 127.0.0.1:6381 accepted. Sending 422 bytes of backlog starting from offset 124407.==> redis-6381.log <==3695:S 13 Nov 14:49:25.823 * Successful partial resynchronization with master.3695:S 13 Nov 14:49:25.823 # Master replication ID changed to 0288d040464ebccbb56dc56d54455434a406bcb23695:S 13 Nov 14:49:25.823 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
当我们再启动6379服务器时,sentinel会让6379成为从库并且连接6380服务器,log如下:
启动6379服务器# redis-server /root/redis/redis-6379.conf# tail -F sentinel-63*.log...==> sentinel-6379.log <==3225:X 13 Nov 16:05:00.384 * +convert-to-slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380...# tail -F redis-63*.log...==> redis-6379.log <==7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: receiving 194 bytes from master7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Flushing old data7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Loading DB in memory7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Finished with success==> redis-6381.log <==3695:S 13 Nov 16:05:36.467 * 1 changes in 900 seconds. Saving...3695:S 13 Nov 16:05:36.468 * Background saving started by pid 75197519:C 13 Nov 16:05:36.486 * DB saved on disk7519:C 13 Nov 16:05:36.487 * RDB: 8 MB of memory used by copy-on-write3695:S 13 Nov 16:05:36.569 * Background saving terminated with success...
未完待续。。。