该系统刚刚上线,需要监控运行情况,发现每天晚上DBTime都会超过阀值,经过分析发现大量的cursor: pin s wait on x,和dblink等待,发生cursor: pin s wait on x等待的SQL无法捕获,我们从分析应该是执行的SQL有DBlink,而问题时段告警日志大量的Dblink故障告警,我们模拟该问题,从而验证我们的分析。
每错误执行一次,告警日志报错如下
Fatal NI connect error 12541, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=test2)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=prod)(CID=(PROGRAM=oracle)(HOST=test1)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 12.1.0.2.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
Time: 22-MAY-2021 02:27:25
Tracing not turned on.
Tns error struct:
ns main err code: 12541
TNS-12541: TNS:no listener
ns secondary err code: 12560
nt main err code: 511
TNS-00511: No listener
nt secondary err code: 111
nt OS err code: 0
捕获语句,可以不关注
select sql_id,event,count(*) from v$active_session_history group by sql_id,event order by count(*);
频繁执行的并发操作,并发执行如下
SQL> declare
2 v_sql varchar2(4000);
3
4 begin
5 for i in 1 .. 100000
6 loop
7 v_sql :='select * from emp@scottlink where rownum<20';
8 execute immediate v_sql;
9 end loop;
10 end;
11 /
declare
*
ERROR at line 1:
ORA-12543: TNS:destination host unreachable
ORA-06512: at line 8
等待事件
SQL> /
EVENT COUNT(*)
------------------------------ ----------
cursor: pin S wait on X 2
并发执行如下,反复执行
SQL> declare
2 v_sql varchar2(4000);
3
4 begin
5 for i in 1 .. 100000
6 loop
7 v_sql :='select * from emp@scottlink where rownum<10';
8 execute immediate v_sql;
9 end loop;
10 end;
11 /
declare
*
ERROR at line 1:
ORA-12543: TNS:destination host unreachable
ORA-06512: at line 8
继续查等待事件等待事件增加
SQL> /
EVENT COUNT(*)
------------------------------ ----------
cursor: pin S wait on X 8
SQL> select sql_id,event,count(*) from v$active_session_history where event like 'cursor%' group by sql_id,event order by count(*);
SQL_ID EVENT COUNT(*)
-------------------------- ------------------------------ ----------
gf49q7pt8kz72 cursor: pin S wait on X 2
4p16kbxk5f274 cursor: pin S wait on X 6
SQL> l
1* select sql_id,event,blocking_session ,count(*) from v$active_session_history where event like 'cursor%' group by sql_id,event,blocking_session order by count(*)
SQL> /
SQL_ID EVENT BLOCKING_SESSION COUNT(*)
-------------------------- ------------------------------ ---------------- ----------
4p16kbxk5f274 cursor: pin S wait on X 58 2
gf49q7pt8kz72 cursor: pin S wait on X 58 2
4p16kbxk5f274 cursor: pin S wait on X 59 4
而这些BLOCKING会话就是反复执行相同查询的会话
针对SQL我们查询
SQL> select sql_id,event,count(*) from v$active_session_history where event like 'cursor%' group by sql_id,event order by count(*);
SQL_ID EVENT COUNT(*)
-------------------------- ------------------------------ ----------
gf49q7pt8kz72 cursor: pin S wait on X 2
4p16kbxk5f274 cursor: pin S wait on X 6
SQL> select sql_text from v$sql where sql_id='gf49q7pt8kz72';
no rows selected
SQL> select sql_text from v$sql where sql_id='4p16kbxk5f274';
no rows selected
开启DBlink
SQL> select * from emp@scottlink where rownum<10;
SQL> select * from emp@scottlink where rownum<20;
SQL> select sql_id ,sql_text from v$sql where sql_text like 'select *%';
SQL_ID SQL_TEXT
-------------------------- --------------------------------------------------
4p16kbxk5f274 select * from emp@scottlink where rownum<10
gf49q7pt8kz72 select * from emp@scottlink where rownum<20
我们看到dblink异常时刻被阻塞的SQL,就是当dblink正常时查询的SQL。这也验证了我们的判断
即: dblink异常时刻,sql做解析,但是由于无法获得对象信息,无法成功,但是依然需要持有一个内存块预放置该SQL对象的解析信息,此时会持有X锁,当其他会话
发起同样的查询,会先尝试判断是否存在造成S锁等待,这就是DBlink问题造成的cursor: pin s wait on x等待事件的原因。
问题如何解决: 保持稳定的dblink连接。