Saturday, November 19, 2011

MySQL MHA - Failover

Lastly, We'll see the mysql master failover. I'm going to make it happen by stopping mysql daemon.
The relations between Master and Slave, MHA Node and MHA Master have been the same before.

Moreover, configuration files such as global and application are too.
Please see the MySQL MHA - Switchover if you need to read the detail of those configuration files.

DB(Master) + MHA Manager192.168.100.200(ha-mgr01)
DB(Slave) + MHA Node192.168.100.197(ha-db01)
DB(Slave) + MHA Node192.168.100.198(ha-db02)

Firstly, stop Master DB(ha-db01)
Secondly, transfer Master_Host from Master(ha-db01) to Slave(ha-db02)

  • Currently logged in MHA Manger(ha-mgr01)
  • run node manager
# masterha_manager --conf=/etc/app1.cnf
Thu Jul 28 12:27:19 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading server configurations from /etc/app1.cnf.. 
  • check the status of Master server
# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:26638) is running(0:PING_OK), master:ha-db01 
  • stop mysql daemon on Master DB
# ssh ha-db01 '/etc/init.d/mysql stop'
  • check the status of Master server
# masterha_check_status --conf=/etc/app1.cnf
app1 is stopped(2:NOT_RUNNING).
  • check the status of slave hosts
# mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db02
*************************** 1. row ***************************
Server_id: 300
     Host: 
     Port: 3306
Master_id: 200
  • verify if the Master_Host has been transferred from Mster(ha-db01) to Slave(ha-db02)
# mysql -uroo -p mysql -e 'SHOW SLAVE STATUS\G' -h localhost
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.198
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 107
               Relay_Log_File: ha-mgr01-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 412
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 200
# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 126
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100
  • Watching the MHA Master server's log over the failover
# tail -f /var/log/masterha/app1/app1.log 
Wed Jul 27 17:21:34 2011 - [info] MHA::MasterMonitor version 0.50.
Wed Jul 27 17:21:35 2011 - [info] Dead Servers:
Wed Jul 27 17:21:35 2011 - [info] Alive Servers:
Wed Jul 27 17:21:35 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:21:35 2011 - [info] Alive Slaves:
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Current Master: ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Checking slave configurations..
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:21:35 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 27 17:21:35 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:21:35 2011 - [info] Starting SSH connection tests..
Wed Jul 27 17:21:36 2011 - [info] All SSH connection tests passed successfully.
Wed Jul 27 17:21:36 2011 - [info] Checking MHA Node version..
Wed Jul 27 17:21:37 2011 - [info]  Version check ok.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Wed Jul 27 17:21:37 2011 - [info]   Executing command: save_binary_logs --command=test --start_file=mysql-bin.000044 --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.50 
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@ha-db01(ha-db01).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000044
Wed Jul 27 17:21:37 2011 - [info] Master setting check done.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 27 17:21:37 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@192.168.100.198(ha-db02).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000018
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000018
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:38 2011 - [info]   Connecting to root@192.168.100.200(ha-mgr01).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to ha-mgr01-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/ha-mgr01-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info] Slaves settings check done.
Wed Jul 27 17:21:38 2011 - [info] 
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
Wed Jul 27 17:21:38 2011 - [warn] master_ip_failover_script is not defined.
Wed Jul 27 17:21:38 2011 - [warn] shutdown_script is not defined.
Wed Jul 27 17:21:38 2011 - [info] Set master ping interval 3 seconds.
Wed Jul 27 17:21:38 2011 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197
Wed Jul 27 17:21:38 2011 - [info] Starting ping health check on ha-db01(192.168.100.197:3306)..
Wed Jul 27 17:21:38 2011 - [info] Ping succeeded, sleeping until it doesn't respond..
Wed Jul 27 17:22:53 2011 - [warn] Got error on MySQL ping: 2006 (MySQL server has gone away)
Wed Jul 27 17:22:53 2011 - [info] Executing seconary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197  --user=root  --master_host=ha-db01  --master_ip=192.168.100.197  --master_port=3306
Wed Jul 27 17:22:53 2011 - [info] HealthCheck: SSH to ha-db01 is reachable.
Monitoring server 192.168.100.198 is reachable, Master is not reachable from 192.168.100.198. OK.
Monitoring server 192.168.100.197 is reachable, Master is not reachable from 192.168.100.197. OK.
Wed Jul 27 17:22:54 2011 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Wed Jul 27 17:22:56 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:56 2011 - [warn] Connection failed 1 time(s)..
Wed Jul 27 17:22:59 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:59 2011 - [warn] Connection failed 2 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:23:02 2011 - [warn] Connection failed 3 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Master is not reachable from health checker!
Wed Jul 27 17:23:02 2011 - [warn] Master ha-db01(192.168.100.197:3306) is not reachable!
Wed Jul 27 17:23:02 2011 - [warn] SSH is reachable.
Wed Jul 27 17:23:02 2011 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Wed Jul 27 17:23:02 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading server configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking slave configurations..
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:23:02 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:23:02 2011 - [info] Master is down!
Wed Jul 27 17:23:02 2011 - [info] Terminating monitoring script.
Wed Jul 27 17:23:02 2011 - [info] Got exit code 20 (Master dead).
Wed Jul 27 17:23:02 2011 - [info] MHA::MasterFailover version 0.50.
Wed Jul 27 17:23:02 2011 - [info] Starting master failover.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 1: Configuration Check Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking master reachability via mysql(double check)..
Wed Jul 27 17:23:02 2011 - [info]  ok.
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Jul 27 17:23:02 2011 - [warn] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Wed Jul 27 17:23:02 2011 - [warn] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3: Master Recovery Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] The latest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] The oldest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Oldest slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Fetching dead master's binary logs..
Wed Jul 27 17:23:02 2011 - [info] Executing command on the dead master ha-db01(192.168.100.197:3306): save_binary_logs --command=save --start_file=mysql-bin.000044  --start_pos=107 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000044 pos 107 to mysql-bin.000044 EOF into /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000044 position 107 to tail(126).. ok.
 Concat succeeded.
Wed Jul 27 17:23:03 2011 - [info] scp from root@192.168.100.197:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:03 2011 - [info] HealthCheck: SSH to ha-db02 is reachable.
Wed Jul 27 17:23:04 2011 - [info] HealthCheck: SSH to ha-mgr01 is reachable.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jul 27 17:23:04 2011 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jul 27 17:23:04 2011 - [info] Searching new master from slaves..
Wed Jul 27 17:23:04 2011 - [info]  Candidate masters from the configuration file:
Wed Jul 27 17:23:04 2011 - [info]  Non-candidate masters:
Wed Jul 27 17:23:04 2011 - [info] New master is ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:04 2011 - [info] Starting master failover..
Wed Jul 27 17:23:04 2011 - [info] 
From:
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
To:
ha-db02 (new master)
 +--ha-mgr01
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:04 2011 - [info] Sending binlog..
Wed Jul 27 17:23:04 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-db02:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jul 27 17:23:04 2011 - [info] Starting recovery on ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:04 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:04 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:04 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-db02)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-db02, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:05 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-db02:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:05 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:05 2011 - [info] Getting new master's binlog name and position..
Wed Jul 27 17:23:05 2011 - [info]  mysql-bin.000006:107
Wed Jul 27 17:23:05 2011 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ha-db02 or 192.168.100.198', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=107, MASTER_USER='replication', MASTER_PASSWORD='xxx';
Wed Jul 27 17:23:05 2011 - [warn] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jul 27 17:23:05 2011 - [info] ** Finished master recovery successfully.
Wed Jul 27 17:23:05 2011 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave diff file generation on host ha-mgr01(192.168.100.200:3306) started, pid: 23533. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:05 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:05 2011 - [info] -- ha-mgr01(192.168.100.200:3306) has the latest relay log events.
Wed Jul 27 17:23:05 2011 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) started, pid: 23535. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Sending binlog..
Wed Jul 27 17:23:05 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-mgr01:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:05 2011 - [info] Starting recovery on ha-mgr01(192.168.100.200:3306)..
Wed Jul 27 17:23:05 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:05 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:05 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-mgr01)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-mgr01, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:06 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-mgr01:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:06 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:06 2011 - [info]  Resetting slave ha-mgr01(192.168.100.200:3306) and starting replication from the new master ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:06 2011 - [info]  Executed CHANGE MASTER.
Wed Jul 27 17:23:06 2011 - [info]  Slave started.
Wed Jul 27 17:23:06 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:06 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) succeeded.
Wed Jul 27 17:23:06 2011 - [info] All new slave servers recovered successfully.
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] * Phase 5: New master cleanup phease..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Resetting slave info on the new master..
Wed Jul 27 17:23:06 2011 - [info] Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] 
----- Failover Report -----
app1: MySQL Master failover ha-db01 to ha-db02 succeeded
Master ha-db01 is down!
Check MHA Manager logs at ha-mgr01.forschooner.net:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave ha-db02(192.168.100.198:3306) has all relay logs for recovery.
Selected ha-db02 as a new master.
ha-db02: OK: Applying all logs succeeded.
ha-mgr01: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
ha-mgr01: OK: Applying all logs succeeded. Slave started, replicating from ha-db02.
ha-db02: Resetting slave info succeeded.
Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] Sending mail..
Unknown option: conf

    1 comment:

    1. This comment has been removed by a blog administrator.

      ReplyDelete