DG环境备库数据不同步-同样的问题又发生了（处理过程与之前略有差异，值得记录下来）

2021-02-01 15:01:14

处理xxxx17备库数据不同步问题，定位备库空间不足导致mrp进程终止，
后续通过主库迁移文件，备库重新定位文件，重启MRP进程拉平数据。
备库的第二个节点的告警日志
2021-01-29T17:01:37.931343+08:00
Managed Standby Recovery not using Real Time Apply
2021-01-29T17:01:55.081492+08:00
Reconfiguration started (old inc 16, new inc 18)
List of instances (total 3) :
1 2 3
My inst 2
Global Resource Directory frozen
* SM/ADG-SM mode terminates successfully - (publisher 3) set domain valid
Communication channels reestablished
2021-01-29T17:01:55.131047+08:00
* domain 0 valid = 1 (flags x0, pdb flags x800) according to instance 1
2021-01-29T17:01:55.134175+08:00
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
2021-01-29T17:01:55.229957+08:00
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived, skipped 0
2021-01-29T17:01:55.230077+08:00
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived, skipped 0
2021-01-29T17:01:55.230082+08:00
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived, skipped 0
2021-01-29T17:01:55.230088+08:00
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived, skipped 0
2021-01-29T17:01:55.230346+08:00
LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived, skipped 0
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
2021-01-29T17:01:56.493982+08:00
Fix write in gcs resources
2021-01-29T17:01:56.519026+08:00
Reconfiguration complete (total time 1.4 secs)
2021-02-01T05:44:08.408942+08:00
Managed Standby Recovery starting Real Time Apply
2021-02-01T05:44:08.419365+08:00
Errors in file /oracle/diag/rdbms/xxxx17dg/xxxx17dg2/trace/xxxx17dg2_dbw0_288944.trc:
ORA-01186: file 589 failed verification tests
ORA-01157: cannot identify/lock data file 589 - see DBWR trace file
ORA-01111: name for data file 589 is unknown - rename to correct file
ORA-01110: data file 589: '/oracle/12.2/dbs/UNNAMED00589'
2021-02-01T05:44:08.419409+08:00
File 589 not verified due to error ORA-01157
2021-02-01T05:44:08.487892+08:00
Managed Standby Recovery not using Real Time Apply
2021-02-01T05:44:08.766778+08:00
Reconfiguration started (old inc 18, new inc 20)
List of instances (total 3) :
1 2 3
My inst 2
Global Resource Directory frozen
Communication channels reestablished

看下错误描述
[oracle@pc1] /oracle/diag/rdbms/xxxx17dg/xxxx17dg2/trace]$oerr ora 1157
01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
// *Cause: The background process was either unable to find one of the data
// files or failed to lock it because the file was already in use.
// The database will prohibit access to this file but other files will
// be unaffected. However the first instance to open the database will
// need to access all online data files. Accompanying error from the
// operating system describes why the file could not be identified.
// *Action: Have operating system make file available to database. Then either
// open the database or do ALTER SYSTEM CHECK DATAFILES.

就是主库创建了数据文件而备库没有创建成功，需要我们检查为啥没有成功。根据经验我们从主备库的磁盘空间检查
发现备库该文件系统已经了，可以确定是主库创建的数据文件在备库没有成功，目前看需要在备库扩容解决，这个问题之前处理
哈局系统是遇到过，所以处理起来思路比较起清楚。由于扩容需要逐级审批，而我们又不能无限制等待，因为主库归档空间有限，备库日志不能
应用主库归档就不会删除，所以必须尽快解决这个问题。我们尝试如下方式解决问题。

SQL> alter system set standby_file_management=manual;

System altered.

SQL> alter database create datafile '/oracle/12.2/dbs/UNNAMED00589' as '/oradata2/xxxx17/GP_3.dbf';

Database altered.

SQL> alter database open;

Database altered.

SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

报错如下

File #589 added to control file as 'UNNAMED00589' because
the parameter STANDBY_FILE_MANAGEMENT is set to MANUAL
The file should be manually created to continue.
Errors with log /gd/3_11509_997797587.arc
MRP0: Background Media Recovery terminated with error 1274
2021-02-01T09:47:27.490672+08:00
Errors in file /oracle/diag/rdbms/xxxx17dg/xxxx17dg1/trace/xxxx17dg1_pr00_2404.trc:
ORA-01274: cannot add data file that was originally created as '/oradata1/xxxx17/YXGP_3.dbf'
2021-02-01T09:47:27.491472+08:00
Managed Standby Recovery not using Real Time Apply
2021-02-01T09:47:27.610550+08:00
Recovery interrupted!
从报错看，我们上面的rename操作是失败的，也就是不允许增加originally created创建的数据文件，
这里提示我们move主库的数据文件到/oradata2，然后再在备库创建失败的数据文件到/oradata2

备库关闭后启动数据库报错
SQL> startup
ORACLE instance started.

Total System Global Area 1.0737E+11 bytes
Fixed Size 29869528 bytes
Variable Size 2.3622E+10 bytes
Database Buffers 8.3483E+10 bytes
Redo Buffers 238563328 bytes
Database mounted.
ORA-10458: standby database requires recovery
ORA-01157: cannot identify/lock data file 589 - see DBWR trace file
ORA-01111: name for data file 589 is unknown - rename to correct file
ORA-01110: data file 589: '/oracle/12.2/dbs/UNNAMED00589

下面在主库将该文件定义到其他目录
在主库
SQL> col file_name for a40
SQL>select file_name,status from dba_data_files where file_id=589

FILE_NAME STATUS
---------------------------------------- ---------
/oradata1/xxxx17/YXGP_3.dbf AVAILABLE

SQL> alter database move datafile '/oradata1/xxxx17/GCAREVTA01_YXGP_3.dbf' to '/oradata2/xxxx17/YXGP_3.dbf';

Database altered.

SQL> select file_name,status from dba_data_files where file_id=589;

FILE_NAME STATUS
---------------------------------------- ---------
/oradata2/xxxx17/YXGP_3.dbf AVAILABLE

在备库重新执行如下操作
SQL> startup mount
SQL> alter system set standby_file_management=manual;
SQL> alter database create datafile '/oracle/12.2/dbs/UNNAMED00589' as '/oradata2/xxxx17/YXGP_3.dbf';
SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

开始追日志
2021-02-01T10:32:57.249224+08:00
Attempt to start background Managed Standby Recovery process (xxxx17dg1)
Starting background process MRP0
2021-02-01T10:32:57.264209+08:00
MRP0 started with pid=9, OS id=128200
2021-02-01T10:32:57.265139+08:00
MRP0: Background Managed Standby Recovery process started (xxxx17dg1)
2021-02-01T10:33:02.278444+08:00
Starting single instance redo apply (SIRA)
Started loxxerger process
2021-02-01T10:33:02.314621+08:00
Managed Standby Recovery starting Real Time Apply
2021-02-01T10:33:04.343863+08:00
Parallel Media Recovery started with 96 slaves
2021-02-01T10:33:04.539233+08:00
Media Recovery Log /gd/1_26471_997797587.arc
2021-02-01T10:33:04.617576+08:00
Media Recovery Log /gd/2_12139_997797587.arc
2021-02-01T10:33:04.803323+08:00
Media Recovery Log /gd/3_11509_997797587.arc
2021-02-01T10:33:05.273477+08:00
Completed: alter database recover managed standby database using current logfile disconnect from session
2021-02-01T10:33:16.496030+08:00
Media Recovery Log /gd/1_26472_997797587.arc

后修改备库参数

SQL> alter system set standby_file_management=auto;

System altered.