摘要:GaussDB T 1.0.1主备HA环境下,误删整个主库数据库目录,恢复遇到两个问题: 1、备库在failover时失败,报错GS-00775 2、重装数据库报错,提示说数据库已经安装。
背景说明:
我一套GaussDB T 1.0.1版本的主备HA环境。连不上主库,追查原因是数据库整个目录被误删除了(rm -rf /u01/gaussdb/)
遇到这个问题,检查备库是正常的,因此准备在备库failover操作,然后主库重新安装,然后设置远程归档目录,再rebuild database,后再主备switchover。
但是遇到两个问题:
1、备库在failover时失败,报错GS-00775, Invalid switch request, could not issue failover when not disconnected。
原因是备库与主库还未disconnect,主库还有zengine进程,备库就能通过复制端口连接主库。
2、主库kill掉zengine进程后,重装主库报错,提示说数据库已经安装。
原因是omm用户下还有GSDB_HOME等环境变量,因此删掉omm用户环境变量后重新安装正常(也可以删除omm用户,前提是omm用户下没有其它文件需要备份),后续操作步骤也正常。。
以下记录整个处理过程:
1、检查备库正常,可以failover,但在failover时失败,如下:
[omm@gsdb02 ~]$ [omm@gsdb02 ~]$ zsql / as sysdba -q connected. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ ONLY PHYSICAL_STANDBY NORMAL TO PRIMARY TO PRIMARY 1 rows fetched. SQL> alter database failover; GS-00775, Invalid switch request, could not issue failover when not disconnected SQL> SQL> exit [omm@gsdb02 ~]$
2、检查主库,发现主库zengine进程还存在,kill掉进程
[omm@gsdb01 ~]$ [omm@gsdb01 ~]$ ps -ef|grep zengine omm 20980 1 2 2月28 ? 02:57:08 /u01/gaussdb/app/bin/zengine open -D /u01/gaussdb/data omm 26216 26082 0 12:02 pts/0 00:00:00 grep --color=auto zengine [omm@gsdb01 ~]$ [omm@gsdb01 ~]$ [omm@gsdb01 ~]$ kill -9 20980 [omm@gsdb01 ~]$ [omm@gsdb01 ~]$ ps -ef|grep zengine omm 26224 26082 0 12:02 pts/0 00:00:00 grep --color=auto zengine [omm@gsdb01 ~]$ [omm@gsdb01 ~]$
3、备库再次failover成功,备库已变为PRIMARY角色。
[omm@gsdb02 ~]$ zsql / as sysdba -q connected. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ ONLY PHYSICAL_STANDBY DISCONNECTED TO PRIMARY TO PRIMARY 1 rows fetched. SQL> SQL> alter database failover; Succeed. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ WRITE PRIMARY NORMAL NOT ALLOWED NOT ALLOWED 1 rows fetched. SQL> exit [omm@gsdb02 ~]$
4、准备重新安装主库,指定-O参数不建库只安装软件,但重新安装失败,报错数据库已经安装,原因是omm用户下还有之前数据库的环境变量。
[root@gsdb01 gaussdb]#
[root@gsdb01 gaussdb]# python install.py -U omm:dbgrp -R /u01/gaussdb/app -D /u01/gaussdb/data -C LSNR_ADDR=127.0.0.1,192.168.179.121 -C LSNR_PORT=1888 -O
Checking runner.
Checking parameters.
End check parameters.
Checking user.
End check user.
Checking old install.
Error: Database has been installed already.
Please refer to install log "/home/omm/zengineinstall.log" for more detailed information.
[root@gsdb01 gaussdb]#
[root@gsdb01 gaussdb]#
5、删掉omm用户再重装正常。
[root@gsdb01 gaussdb]#
[root@gsdb01 gaussdb]# userdel -r omm
[root@gsdb01 gaussdb]# useradd -g dbgrp -d /home/omm -m -s /bin/bash omm
[root@gsdb01 gaussdb]#
[root@gsdb01 gaussdb]# python install.py -U omm:dbgrp -R /u01/gaussdb/app -D /u01/gaussdb/data -C LSNR_ADDR=127.0.0.1,192.168.179.121 -C LSNR_PORT=1888 -O
Checking runner.
Checking parameters.
End check parameters.
Checking user.
End check user.
Checking old install.
End check old install.
Checking kernel parameters.
Checking directory.
Checking integrality of run file...
Decompressing run file.
Setting user env.
Checking data dir and config file
Initialize db instance.
Changing file permission due to security audit.
Install successfully, for more detail information see /home/omm/zengineinstall.log.
[root@gsdb01 gaussdb]#
6、修改远程复制参数
[root@gsdb01 gaussdb]# su - omm
上一次登录:四 3月 5 12:19:35 CST 2020
[omm@gsdb01 ~]$ cd /u01/gaussdb/data/cfg/
[omm@gsdb01 cfg]$ vi zengine.ini
[omm@gsdb01 cfg]$
[omm@gsdb01 cfg]$
7、重启原主库到nomount状态,进行rebuild database。角色转为PHYSICAL_STANDBY
[omm@gsdb01 cfg]$ ps -ef|grep zen omm 27057 26976 0 12:23 pts/0 00:00:00 grep --color=auto zen [omm@gsdb01 cfg]$ [omm@gsdb01 cfg]$ cd /u01/gaussdb/app/bin/ [omm@gsdb01 bin]$ python zctl.py -t start -m nomount Successfully started instance. [omm@gsdb01 bin]$ [omm@gsdb01 bin]$ zsql / as sysdba -q connected. SQL> build database; Succeed. SQL> SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ ONLY PHYSICAL_STANDBY NORMAL TO PRIMARY TO PRIMARY 1 rows fetched. SQL> exit [omm@gsdb01 bin]$
此时原备库的状态,仍然是PRIMARY
[omm@gsdb02 ~]$ zsql / as sysdba -q connected. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ WRITE PRIMARY NORMAL NOT ALLOWED NOT ALLOWED 1 rows fetched. SQL> exit [omm@gsdb02 ~]$
8、后进行switchover操作,恢复原来的主备关系。。
[omm@gsdb01 bin]$ zsql / as sysdba -q connected. SQL> SQL> alter database switchover; Succeed. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ WRITE PRIMARY NORMAL NOT ALLOWED NOT ALLOWED 1 rows fetched. SQL> [omm@gsdb02 run]$ zsql / as sysdba -q connected. SQL> select name,status,open_status,database_role,database_condition,switchover_status,failover_status from v$database; NAME STATUS OPEN_STATUS DATABASE_ROLE DATABASE_CONDITION SWITCHOVER_STATUS FAILOVER_STATUS ---------- ----------- ----------------- -------------------- ------------------- -------------------- ------------------ YHGSDB OPEN READ ONLY PHYSICAL_STANDBY NORMAL TO PRIMARY TO PRIMARY 1 rows fetched. SQL>
到此,整个过程完毕。