MHA是一套 MySQL 高可用管理软件 ,除了检测 Master 宕机后,提升候选 Slave 为 New Master 之外,还会自动让其他 Slave 与 New Master 建立复制关系。切换通过浮动 ip 的方式,浮动 ip 将绑定在任何时期的主节点上(若切换则 ip 也漂移),通过浮动 ip 提供服务。
shell> mysqlrplshow --master=root:111111@"192.168.122.66:3307" --discover-slaves-login=root:111111 -r -v WARNING: Using a password on the command line interface can be insecure. # master on 192.168.122.66: ... connected. # Finding slaves for master: 192.168.122.66:3307 # master on 192.168.122.70: ... connected. # Finding slaves for master: 192.168.122.70:3307
WARNING: Cannot connect to some slaves: - 192.168.122.70:3306: Can't connect to MySQL server on '192.168.122.70:3306' (111 Connection refused) # master on 192.168.122.80: ... connected. # Finding slaves for master: 192.168.122.80:3307
# 同样注意修改系统对应的接口名称,和对应的VIP my $vip = '192.168.122.99/24'; my $eth = 'eth0'; my $key = '88'; my $ssh_start_vip = "/sbin/ifconfig $eth:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig $eth:$key down"; my $ssh_user = "root";
# ssh检测 shell> masterha_check_ssh --conf=/etc/masterha/app1.conf # 下面是检测正常 Wed Mar 14 09:55:45 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Mar 14 09:55:45 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Wed Mar 14 09:55:45 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. Wed Mar 14 09:55:45 2018 - [info] Starting SSH connection tests.. Wed Mar 14 09:55:46 2018 - [debug] Wed Mar 14 09:55:45 2018 - [debug] Connecting via SSH from root@master(192.168.122.66:22) to root@slave1(192.168.122.70:22).. Wed Mar 14 09:55:45 2018 - [debug] ok. Wed Mar 14 09:55:45 2018 - [debug] Connecting via SSH from root@master(192.168.122.66:22) to root@slave2(192.168.122.80:22).. Wed Mar 14 09:55:46 2018 - [debug] ok. Wed Mar 14 09:55:47 2018 - [debug] Wed Mar 14 09:55:46 2018 - [debug] Connecting via SSH from root@slave2(192.168.122.80:22) to root@master(192.168.122.66:22).. Wed Mar 14 09:55:47 2018 - [debug] ok. Wed Mar 14 09:55:47 2018 - [debug] Connecting via SSH from root@slave2(192.168.122.80:22) to root@slave1(192.168.122.70:22).. Wed Mar 14 09:55:47 2018 - [debug] ok. Wed Mar 14 09:55:47 2018 - [debug] Wed Mar 14 09:55:45 2018 - [debug] Connecting via SSH from root@slave1(192.168.122.70:22) to root@master(192.168.122.66:22).. Wed Mar 14 09:55:46 2018 - [debug] ok. Wed Mar 14 09:55:46 2018 - [debug] Connecting via SSH from root@slave1(192.168.122.70:22) to root@slave2(192.168.122.80:22).. Wed Mar 14 09:55:47 2018 - [debug] ok. Wed Mar 14 09:55:47 2018 - [info] All SSH connection tests passed successfully. # 复制关系检测 shell> masterha_check_repl --conf=/etc/masterha/app1.conf Wed Mar 14 09:57:00 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Mar 14 09:57:00 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Wed Mar 14 09:57:00 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. Wed Mar 14 09:57:00 2018 - [info] MHA::MasterMonitor version 0.57. Wed Mar 14 09:57:00 2018 - [debug] Connecting to servers.. Wed Mar 14 09:57:01 2018 - [debug] Connected to: master(192.168.122.66:3307), user=root Wed Mar 14 09:57:01 2018 - [debug] Number of slave worker threads on host master(192.168.122.66:3307): 0 Wed Mar 14 09:57:01 2018 - [debug] Connected to: slave1(192.168.122.70:3307), user=root Wed Mar 14 09:57:01 2018 - [debug] Number of slave worker threads on host slave1(192.168.122.70:3307): 8 Wed Mar 14 09:57:01 2018 - [debug] Connected to: slave2(192.168.122.80:3307), user=root Wed Mar 14 09:57:01 2018 - [debug] Number of slave worker threads on host slave2(192.168.122.80:3307): 16 Wed Mar 14 09:57:01 2018 - [debug] Comparing MySQL versions.. Wed Mar 14 09:57:01 2018 - [debug] Comparing MySQL versions done. Wed Mar 14 09:57:01 2018 - [debug] Connecting to servers done. Wed Mar 14 09:57:01 2018 - [info] GTID failover mode = 1 Wed Mar 14 09:57:01 2018 - [info] Dead Servers: Wed Mar 14 09:57:01 2018 - [info] Alive Servers: Wed Mar 14 09:57:01 2018 - [info] master(192.168.122.66:3307) Wed Mar 14 09:57:01 2018 - [info] slave1(192.168.122.70:3307) Wed Mar 14 09:57:01 2018 - [info] slave2(192.168.122.80:3307) Wed Mar 14 09:57:01 2018 - [info] Alive Slaves: Wed Mar 14 09:57:01 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Mar 14 09:57:01 2018 - [info] GTID ON Wed Mar 14 09:57:01 2018 - [debug] Relay log info repository: TABLE Wed Mar 14 09:57:01 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Wed Mar 14 09:57:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Wed Mar 14 09:57:01 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Mar 14 09:57:01 2018 - [info] GTID ON Wed Mar 14 09:57:01 2018 - [debug] Relay log info repository: TABLE Wed Mar 14 09:57:01 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Wed Mar 14 09:57:01 2018 - [info] Not candidate for the new Master (no_master is set) Wed Mar 14 09:57:01 2018 - [info] Current Alive Master: master(192.168.122.66:3307) Wed Mar 14 09:57:01 2018 - [info] Checking slave configurations.. Wed Mar 14 09:57:01 2018 - [info] read_only=1 is not set on slave slave1(192.168.122.70:3307). Wed Mar 14 09:57:01 2018 - [info] read_only=1 is not set on slave slave2(192.168.122.80:3307). Wed Mar 14 09:57:01 2018 - [info] Checking replication filtering settings.. Wed Mar 14 09:57:01 2018 - [info] binlog_do_db= , binlog_ignore_db= Wed Mar 14 09:57:01 2018 - [info] Replication filtering check ok. Wed Mar 14 09:57:01 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Wed Mar 14 09:57:01 2018 - [info] Checking SSH publickey authentication settings on the current master.. Wed Mar 14 09:57:01 2018 - [debug] SSH connection test to master, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Wed Mar 14 09:57:01 2018 - [info] HealthCheck: SSH to master is reachable. Wed Mar 14 09:57:01 2018 - [info] master(192.168.122.66:3307) (current master) # -- 有输出拓扑图和slave健康状况的信息 +--slave1(192.168.122.70:3307) +--slave2(192.168.122.80:3307)
Wed Mar 14 09:57:01 2018 - [info] Checking replication health on slave1.. Wed Mar 14 09:57:01 2018 - [info] ok. Wed Mar 14 09:57:01 2018 - [info] Checking replication health on slave2.. Wed Mar 14 09:57:01 2018 - [info] ok. Wed Mar 14 09:57:01 2018 - [info] Checking master_ip_failover_script status: Wed Mar 14 09:57:01 2018 - [info] /usr/local/mha/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.122.66 --orig_master_port=3307
IN SCRIPT TEST====/sbin/ifconfig eth0:88 down==/sbin/ifconfig eth0:88 192.168.122.99/24===
Checking the Status of the script.. OK Wed Mar 14 09:57:01 2018 - [info] OK. Wed Mar 14 09:57:01 2018 - [warning] shutdown_script is not defined. Wed Mar 14 09:57:01 2018 - [debug] Disconnected from master(192.168.122.66:3307) Wed Mar 14 09:57:01 2018 - [debug] Disconnected from slave1(192.168.122.70:3307) Wed Mar 14 09:57:01 2018 - [debug] Disconnected from slave2(192.168.122.80:3307) Wed Mar 14 09:57:01 2018 - [info] Got exit code 0 (Not master dead).
Tue Mar 13 17:10:50 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Mar 13 17:10:50 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Tue Mar 13 17:10:50 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. Tue Mar 13 17:12:23 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Mar 13 17:12:23 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Tue Mar 13 17:12:23 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. tory. Check for details, and consider setting --workdir separately. Tue Mar 13 17:10:50 2018 - [debug] Connecting to servers.. Tue Mar 13 17:10:51 2018 - [debug] Connected to: master(192.168.122.66:3307), user=root Tue Mar 13 17:10:51 2018 - [debug] Number of slave worker threads on host master(192.168.122.66:3307): 0 Tue Mar 13 17:10:51 2018 - [debug] Connected to: slave1(192.168.122.70:3307), user=root Tue Mar 13 17:10:51 2018 - [debug] Number of slave worker threads on host slave1(192.168.122.70:3307): 8 Tue Mar 13 17:10:51 2018 - [debug] Connected to: slave2(192.168.122.80:3307), user=root Tue Mar 13 17:10:51 2018 - [debug] Number of slave worker threads on host slave2(192.168.122.80:3307): 16 Tue Mar 13 17:10:51 2018 - [debug] Comparing MySQL versions.. Tue Mar 13 17:10:51 2018 - [debug] Comparing MySQL versions done. Tue Mar 13 17:10:51 2018 - [debug] Connecting to servers done. Tue Mar 13 17:10:51 2018 - [info] GTID failover mode = 1 Tue Mar 13 17:10:51 2018 - [info] Dead Servers: Tue Mar 13 17:10:51 2018 - [info] Alive Servers: Tue Mar 13 17:10:51 2018 - [info] master(192.168.122.66:3307) Tue Mar 13 17:10:51 2018 - [info] slave1(192.168.122.70:3307) Tue Mar 13 17:10:51 2018 - [info] slave2(192.168.122.80:3307) Tue Mar 13 17:10:51 2018 - [info] Alive Slaves: Tue Mar 13 17:10:51 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:10:51 2018 - [info] GTID ON Tue Mar 13 17:10:51 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:10:51 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:10:51 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 13 17:10:51 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:10:51 2018 - [info] GTID ON Tue Mar 13 17:10:51 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:10:51 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:10:51 2018 - [info] Not candidate for the new Master (no_master is set) Tue Mar 13 17:10:51 2018 - [info] Current Alive Master: master(192.168.122.66:3307) Tue Mar 13 17:10:51 2018 - [info] Checking slave configurations.. Tue Mar 13 17:10:51 2018 - [info] read_only=1 is not set on slave slave1(192.168.122.70:3307). Tue Mar 13 17:10:51 2018 - [info] read_only=1 is not set on slave slave2(192.168.122.80:3307). Tue Mar 13 17:10:51 2018 - [info] Checking replication filtering settings.. Tue Mar 13 17:10:51 2018 - [info] binlog_do_db= , binlog_ignore_db= Tue Mar 13 17:10:51 2018 - [info] Replication filtering check ok. Tue Mar 13 17:10:51 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Mar 13 17:10:51 2018 - [info] Checking SSH publickey authentication settings on the current master.. Tue Mar 13 17:10:51 2018 - [debug] SSH connection test to master, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Tue Mar 13 17:10:52 2018 - [info] HealthCheck: SSH to master is reachable. Tue Mar 13 17:10:52 2018 - [info] master(192.168.122.66:3307) (current master) +--slave1(192.168.122.70:3307) +--slave2(192.168.122.80:3307)
IN SCRIPT TEST====/sbin/ifconfig eth0:88 down==/sbin/ifconfig eth0:88 192.168.122.99/24=== # 还没故障转移的正常启动日志 Checking the Status of the script.. OK Tue Mar 13 17:10:52 2018 - [info] OK. Tue Mar 13 17:10:52 2018 - [warning] shutdown_script is not defined. Tue Mar 13 17:10:52 2018 - [debug] Disconnected from master(192.168.122.66:3307) Tue Mar 13 17:10:52 2018 - [debug] Disconnected from slave1(192.168.122.70:3307) Tue Mar 13 17:10:52 2018 - [debug] Disconnected from slave2(192.168.122.80:3307) Tue Mar 13 17:10:52 2018 - [debug] SSH check command: exit 0 Tue Mar 13 17:10:52 2018 - [info] Set master ping interval 3 seconds. Tue Mar 13 17:10:52 2018 - [info] Set secondary check script: /usr/local/mha/bin/masterha_secondary_check -s 192.168.122.70 -s 192.168.122.80 --user=root --master_host=192.168.122.66 --master_port=3307 Tue Mar 13 17:10:52 2018 - [info] Starting ping health check on master(192.168.122.66:3307).. Tue Mar 13 17:10:52 2018 - [debug] Connected on master. Tue Mar 13 17:10:52 2018 - [debug] Set short wait_timeout on master: 6 seconds Tue Mar 13 17:10:52 2018 - [debug] Trying to get advisory lock.. Tue Mar 13 17:10:52 2018 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond.. # 开始等待mysql不响应 Tue Mar 13 17:12:13 2018 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away) # 此时发现master实例ping不通了 Tue Mar 13 17:12:13 2018 - [info] Executing secondary network check script: /usr/local/mha/bin/masterha_secondary_check -s 192.168.122.70 -s 192.168.122.80 --user=root --master_host=192.168.122.66 --master_port=3307 --user=root --master_host=master --master_ip=192.168.122.66 --master_port=3307 --master_user=root --master_password=111111 --ping_type=INSERT Tue Mar 13 17:12:13 2018 - [info] Executing SSH check script: exit 0 Tue Mar 13 17:12:13 2018 - [debug] SSH connection test to master, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Tue Mar 13 17:12:13 2018 - [info] HealthCheck: SSH to master is reachable. # 用两个slave实例检测ssh是否能到master Monitoring server 192.168.122.70 is reachable, Master is not reachable from 192.168.122.70. OK. Monitoring server 192.168.122.80 is reachable, Master is not reachable from 192.168.122.80. OK. Tue Mar 13 17:12:14 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start. # 两个slave都不能连到master的mysql实例,重试3次 Tue Mar 13 17:12:16 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.122.66' (111)) Tue Mar 13 17:12:16 2018 - [warning] Connection failed 2 time(s).. Tue Mar 13 17:12:19 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.122.66' (111)) Tue Mar 13 17:12:19 2018 - [warning] Connection failed 3 time(s).. Tue Mar 13 17:12:22 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.122.66' (111)) Tue Mar 13 17:12:22 2018 - [warning] Connection failed 4 time(s).. Tue Mar 13 17:12:22 2018 - [warning] Master is not reachable from health checker! Tue Mar 13 17:12:22 2018 - [warning] Master master(192.168.122.66:3307) is not reachable! Tue Mar 13 17:12:22 2018 - [warning] SSH is reachable. Tue Mar 13 17:12:22 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.conf again, and trying to connect to all servers to check server status.. Tue Mar 13 17:12:22 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Mar 13 17:12:22 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Tue Mar 13 17:12:22 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. Tue Mar 13 17:12:22 2018 - [debug] Skipping connecting to dead master master(192.168.122.66:3307). Tue Mar 13 17:12:22 2018 - [debug] Connecting to servers.. Tue Mar 13 17:12:23 2018 - [debug] Connected to: slave1(192.168.122.70:3307), user=root Tue Mar 13 17:12:23 2018 - [debug] Number of slave worker threads on host slave1(192.168.122.70:3307): 8 Tue Mar 13 17:12:23 2018 - [debug] Connected to: slave2(192.168.122.80:3307), user=root Tue Mar 13 17:12:23 2018 - [debug] Number of slave worker threads on host slave2(192.168.122.80:3307): 16 Tue Mar 13 17:12:23 2018 - [debug] Comparing MySQL versions.. Tue Mar 13 17:12:23 2018 - [debug] Comparing MySQL versions done. Tue Mar 13 17:12:23 2018 - [debug] Connecting to servers done. Tue Mar 13 17:12:23 2018 - [info] GTID failover mode = 1 Tue Mar 13 17:12:23 2018 - [info] Dead Servers: # master 已经dead Tue Mar 13 17:12:23 2018 - [info] master(192.168.122.66:3307) Tue Mar 13 17:12:23 2018 - [info] Alive Servers: Tue Mar 13 17:12:23 2018 - [info] slave1(192.168.122.70:3307) Tue Mar 13 17:12:23 2018 - [info] slave2(192.168.122.80:3307) Tue Mar 13 17:12:23 2018 - [info] Alive Slaves: Tue Mar 13 17:12:23 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:23 2018 - [info] GTID ON Tue Mar 13 17:12:23 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:23 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:23 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 13 17:12:23 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:23 2018 - [info] GTID ON Tue Mar 13 17:12:23 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:23 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:23 2018 - [info] Not candidate for the new Master (no_master is set) Tue Mar 13 17:12:23 2018 - [info] Checking slave configurations.. Tue Mar 13 17:12:23 2018 - [info] read_only=1 is not set on slave slave1(192.168.122.70:3307). Tue Mar 13 17:12:23 2018 - [info] read_only=1 is not set on slave slave2(192.168.122.80:3307). Tue Mar 13 17:12:23 2018 - [info] Checking replication filtering settings.. Tue Mar 13 17:12:23 2018 - [info] Replication filtering check ok. Tue Mar 13 17:12:23 2018 - [info] Master is down! Tue Mar 13 17:12:23 2018 - [info] Terminating monitoring script. Tue Mar 13 17:12:23 2018 - [debug] Disconnected from slave1(192.168.122.70:3307) Tue Mar 13 17:12:23 2018 - [debug] Disconnected from slave2(192.168.122.80:3307) Tue Mar 13 17:12:23 2018 - [info] Got exit code 20 (Master dead). Tue Mar 13 17:12:23 2018 - [info] MHA::MasterFailover version 0.57. Tue Mar 13 17:12:23 2018 - [info] Starting master failover. Tue Mar 13 17:12:23 2018 - [info] Tue Mar 13 17:12:23 2018 - [info] * Phase 1: Configuration Check Phase.. Tue Mar 13 17:12:23 2018 - [info] Tue Mar 13 17:12:23 2018 - [debug] Skipping connecting to dead master master. Tue Mar 13 17:12:23 2018 - [debug] Connecting to servers.. Tue Mar 13 17:12:24 2018 - [debug] Connected to: slave1(192.168.122.70:3307), user=root Tue Mar 13 17:12:24 2018 - [debug] Number of slave worker threads on host slave1(192.168.122.70:3307): 8 Tue Mar 13 17:12:24 2018 - [debug] Connected to: slave2(192.168.122.80:3307), user=root Tue Mar 13 17:12:24 2018 - [debug] Number of slave worker threads on host slave2(192.168.122.80:3307): 16 Tue Mar 13 17:12:24 2018 - [debug] Comparing MySQL versions.. Tue Mar 13 17:12:24 2018 - [debug] Comparing MySQL versions done. Tue Mar 13 17:12:24 2018 - [debug] Connecting to servers done. Tue Mar 13 17:12:24 2018 - [info] GTID failover mode = 1 Tue Mar 13 17:12:24 2018 - [info] Dead Servers: Tue Mar 13 17:12:24 2018 - [info] master(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Alive Servers: Tue Mar 13 17:12:24 2018 - [info] slave1(192.168.122.70:3307) Tue Mar 13 17:12:24 2018 - [info] slave2(192.168.122.80:3307) Tue Mar 13 17:12:24 2018 - [info] Alive Slaves: Tue Mar 13 17:12:24 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Primary candidate for the new Master (candidate_master is set) # 存活的slave中slave1是配置了可以提升为master的 Tue Mar 13 17:12:24 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Not candidate for the new Master (no_master is set) # slave2配置文件未设置能提升为master Tue Mar 13 17:12:24 2018 - [info] Starting GTID based failover. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] ** Phase 1: Configuration Check Phase completed. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [debug] Stopping IO thread on slave1(192.168.122.70:3307).. # 停止slave1的IO线程 Tue Mar 13 17:12:24 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Mar 13 17:12:24 2018 - [info] Executing master IP deactivation script: Tue Mar 13 17:12:24 2018 - [info] /usr/local/mha/bin/master_ip_failover --orig_master_host=master --orig_master_ip=192.168.122.66 --orig_master_port=3307 --command=stopssh --ssh_user=root Tue Mar 13 17:12:24 2018 - [debug] Stopping IO thread on slave2(192.168.122.80:3307).. Tue Mar 13 17:12:24 2018 - [debug] Stop IO thread on slave2(192.168.122.80:3307) done.
Disabling the VIP on old master: master Tue Mar 13 17:12:24 2018 - [debug] Stop IO thread on slave1(192.168.122.70:3307) done. Tue Mar 13 17:12:24 2018 - [info] done. Tue Mar 13 17:12:24 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Mar 13 17:12:24 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] * Phase 3: Master Recovery Phase.. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [debug] Fetching current slave status.. Tue Mar 13 17:12:24 2018 - [debug] Fetching current slave status done. Tue Mar 13 17:12:24 2018 - [info] The latest binary log file/position on all slaves is bin.000004:20816 Tue Mar 13 17:12:24 2018 - [info] Retrieved Gtid Set: 0c154ad5-2699-11e8-94a1-525400eac085:309-394 Tue Mar 13 17:12:24 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Mar 13 17:12:24 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 13 17:12:24 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Not candidate for the new Master (no_master is set) Tue Mar 13 17:12:24 2018 - [info] The oldest binary log file/position on all slaves is bin.000004:20816 Tue Mar 13 17:12:24 2018 - [info] Retrieved Gtid Set: 0c154ad5-2699-11e8-94a1-525400eac085:309-394 Tue Mar 13 17:12:24 2018 - [info] Oldest slaves: Tue Mar 13 17:12:24 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 13 17:12:24 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Not candidate for the new Master (no_master is set) Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] * Phase 3.3: Determining New Master Phase.. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] Searching new master from slaves.. Tue Mar 13 17:12:24 2018 - [info] Candidate masters from the configuration file: Tue Mar 13 17:12:24 2018 - [info] slave1(192.168.122.70:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 13 17:12:24 2018 - [info] Non-candidate masters: Tue Mar 13 17:12:24 2018 - [info] slave2(192.168.122.80:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Tue Mar 13 17:12:24 2018 - [info] GTID ON Tue Mar 13 17:12:24 2018 - [debug] Relay log info repository: TABLE Tue Mar 13 17:12:24 2018 - [info] Replicating from 192.168.122.66(192.168.122.66:3307) Tue Mar 13 17:12:24 2018 - [info] Not candidate for the new Master (no_master is set) Tue Mar 13 17:12:24 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Mar 13 17:12:24 2018 - [info] New master is slave1(192.168.122.70:3307) # 选择slave1提升为新的master Tue Mar 13 17:12:24 2018 - [info] Starting master failover.. # 开始failover切换 Tue Mar 13 17:12:24 2018 - [info] From: # 原架构拓扑 master(192.168.122.66:3307) (current master) +--slave1(192.168.122.70:3307) +--slave2(192.168.122.80:3307)
To: # failover后的拓扑 slave1(192.168.122.70:3307) (new master) +--slave2(192.168.122.80:3307) Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] * Phase 3.3: New Master Recovery Phase.. Tue Mar 13 17:12:24 2018 - [info] Tue Mar 13 17:12:24 2018 - [info] Waiting all logs to be applied.. Tue Mar 13 17:12:24 2018 - [info] done. Tue Mar 13 17:12:24 2018 - [debug] Stopping slave IO/SQL thread on slave1(192.168.122.70:3307).. Tue Mar 13 17:12:24 2018 - [debug] done. Tue Mar 13 17:12:24 2018 - [info] Getting new master's binlog name and position.. Tue Mar 13 17:12:24 2018 - [info] bin.000003:94211 Tue Mar 13 17:12:24 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave1 or 192.168.122.70', MASTER_PORT=3307, MASTER_AUTO_POSITION=1, MASTER_USER='rpl', MASTER_PASSWORD='xxx'; Tue Mar 13 17:12:24 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000003, 94211, 0c154ad5-2699-11e8-94a1-525400eac085:1-394, c08d09b5-2698-11e8-9ec0-5254004dae68:1-2 Tue Mar 13 17:12:24 2018 - [info] Executing master IP activate script: Tue Mar 13 17:12:24 2018 - [info] /usr/local/mha/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.122.66 --orig_master_port=3307 --new_master_host=slave1 --new_master_ip=192.168.122.70 --new_master_port=3307 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:88 down==/sbin/ifconfig eth0:88 192.168.122.99/24===
Enabling the VIP - 192.168.122.99/24 on the new master - slave1 Tue Mar 13 17:12:25 2018 - [info] OK. Tue Mar 13 17:12:25 2018 - [info] ** Finished master recovery successfully. Tue Mar 13 17:12:25 2018 - [info] * Phase 3: Master Recovery Phase completed. Tue Mar 13 17:12:25 2018 - [info] Tue Mar 13 17:12:25 2018 - [info] * Phase 4: Slaves Recovery Phase.. Tue Mar 13 17:12:25 2018 - [info] Tue Mar 13 17:12:25 2018 - [info] Tue Mar 13 17:12:25 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. Tue Mar 13 17:12:25 2018 - [info] Tue Mar 13 17:12:25 2018 - [info] -- Slave recovery on host slave2(192.168.122.80:3307) started, pid: 10212. Check tmp log /var/log/masterha/app1/slave2_3307_20180313171223.log if it takes time.. Tue Mar 13 17:12:27 2018 - [info] Tue Mar 13 17:12:27 2018 - [info] Log messages from slave2 ... Tue Mar 13 17:12:27 2018 - [info] Tue Mar 13 17:12:25 2018 - [info] Resetting slave slave2(192.168.122.80:3307) and starting replication from the new master slave1(192.168.122.70:3307).. Tue Mar 13 17:12:25 2018 - [debug] Stopping slave IO/SQL thread on slave2(192.168.122.80:3307).. Tue Mar 13 17:12:25 2018 - [debug] done. Tue Mar 13 17:12:25 2018 - [info] Executed CHANGE MASTER. Tue Mar 13 17:12:25 2018 - [debug] Starting slave IO/SQL thread on slave2(192.168.122.80:3307).. Tue Mar 13 17:12:26 2018 - [debug] done. Tue Mar 13 17:12:26 2018 - [info] Slave started. Tue Mar 13 17:12:26 2018 - [info] gtid_wait(0c154ad5-2699-11e8-94a1-525400eac085:1-394, c08d09b5-2698-11e8-9ec0-5254004dae68:1-2) completed on slave2(192.168.122.80:3307). Executed 3 events. Tue Mar 13 17:12:27 2018 - [info] End of log messages from slave2. Tue Mar 13 17:12:27 2018 - [info] -- Slave on host slave2(192.168.122.80:3307) started. Tue Mar 13 17:12:27 2018 - [info] All new slave servers recovered successfully. Tue Mar 13 17:12:27 2018 - [info] Tue Mar 13 17:12:27 2018 - [info] * Phase 5: New master cleanup phase.. Tue Mar 13 17:12:27 2018 - [info] Tue Mar 13 17:12:27 2018 - [info] Resetting slave info on the new master.. Tue Mar 13 17:12:27 2018 - [debug] Clearing slave info.. Tue Mar 13 17:12:27 2018 - [debug] Stopping slave IO/SQL thread on slave1(192.168.122.70:3307).. Tue Mar 13 17:12:27 2018 - [debug] done. Tue Mar 13 17:12:27 2018 - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK. Tue Mar 13 17:12:27 2018 - [info] slave1: Resetting slave info succeeded. Tue Mar 13 17:12:27 2018 - [info] Master failover to slave1(192.168.122.70:3307) completed successfully. Tue Mar 13 17:12:27 2018 - [info] Deleted server1 entry from /etc/masterha/app1.conf . Tue Mar 13 17:12:27 2018 - [debug] Disconnected from slave1(192.168.122.70:3307) Tue Mar 13 17:12:27 2018 - [debug] Disconnected from slave2(192.168.122.80:3307) Tue Mar 13 17:12:27 2018 - [info]
----- Failover Report ----- # 下面是failover的报表,集群中master角色从原来的66:3307切到了70:3307机器 app1: MySQL Master failover master(192.168.122.66:3307) to slave1(192.168.122.70:3307) succeeded
Master master(192.168.122.66:3307) is down!
Check MHA Manager logs at centos-66:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover. Invalidated master IP address on master(192.168.122.66:3307) Selected slave1(192.168.122.70:3307) as a new master. slave1(192.168.122.70:3307): OK: Applying all logs succeeded. slave1(192.168.122.70:3307): OK: Activated master IP address. slave2(192.168.122.80:3307): OK: Slave started, replicating from slave1(192.168.122.70:3307) slave1(192.168.122.70:3307): Resetting slave info succeeded. Master failover to slave1(192.168.122.70:3307) completed successfully. # 切换成功