近在做一些功能兼容的工作,做了一些数据类型的工作。其中一部分是添加新的基础类型,很多细节值得记录一下,在此进行步骤介绍以及代码介绍。
此次要添加一种新的数据类型,这种类型使用的算法和PostgreSQL使用相同的input和output函数。那么接下来就将所有步骤进行演示。
类型信息
- 类型名,newtype;
- 类型长度,pg中显示-1,定长,和bigint相同,实际占用8个字节;
- 使用bigint(int8或int64)的input、output、send、receive函数;
- 能够进行显示、排序、简单比较、建立索引;
实现方法
- 定义基本函数:
- 使用unused_oids,确定oid
- newtypein,newtypeout,newtyperecv,new*end[1]
- 在系统表pg_proc中注册相关以上4个函数
2. 定义基础类型:
- 在系统表pg_type中注册类型newtype
- 在bootstrap注册基础类型[2]
3. 实验效果:
shawn@B-D53RLVDL-1650 bin % ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create table testnewtype(t1 newtype);
CREATE TABLE
postgres=# insert into testnewtype values ('1'),('2');
INSERT 2
postgres=# checkpoint ;
CHECKPOINT
postgres=# select * from testnewtype ;
t1
----
1
2
(2 rows)
postgres=#
4. 排序问题
按照以上步骤建立基础类型后,是无法直接进行排序操作的。
postgres=# select * from testnewtype order by t1;
2020-04-12 21:43:21.710 CST [60897] ERROR: could not identify an ordering operator for type newtype at character 36
2020-04-12 21:43:21.710 CST [60897] HINT: Use an explicit ordering operator or modify the query.
2020-04-12 21:43:21.710 CST [60897] STATEMENT: select * from testnewtype order by t1;
ERROR: could not identify an ordering operator for type newtype
LINE 1: select * from testnewtype order by t1;
^
HINT: Use an explicit ordering operator or modify the query.
postgres=#
那么接下来就需要支持排序。
4.1 根据以上报错信息可以找到以下函数堆:
(gdb) bt
#0 get_sort_group_operators (argtype=argtype@entry=562, needLT=needLT@entry=true,
needEQ=needEQ@entry=true, needGT=needGT@entry=false, ltOpr=ltOpr@entry=0x7ffe6113a108,
eqOpr=eqOpr@entry=0x7ffe6113a10c, gtOpr=gtOpr@entry=0x0,
isHashable=isHashable@entry=0x7ffe6113a106) at parse_oper.c:191
#1 0x0000000000579b75 in addTargetToSortList (pstate=pstate@entry=0x216adf8, tle=0x216b4a0,
sortlist=sortlist@entry=0x0, targetlist=0x216b5b0, sortby=0x216acd8)
at parse_clause.c:3299
#2 0x0000000000579d4e in transformSortClause (pstate=pstate@entry=0x216adf8,
orderlist=0x216ad28, targetlist=targetlist@entry=0x216af58,
exprKind=exprKind@entry=EXPR_KIND_ORDER_BY, useSQL99=useSQL99@entry=false)
at parse_clause.c:2607
#3 0x000000000055b444 in transformSelectStmt (stmt=0x216ab10, pstate=0x216adf8)
at analyze.c:1244
#4 transformStmt (pstate=pstate@entry=0x216adf8, parseTree=0x216ab10) at analyze.c:301
#5 0x000000000055cdc8 in transformOptionalSelectInto (pstate=pstate@entry=0x216adf8,
parseTree=<optimized out>) at analyze.c:246
#6 0x000000000055cece in transformTopLevelStmt (parseTree=0x216ad78, pstate=0x216adf8)
at analyze.c:196
#7 parse_analyze (parseTree=0x216ad78,
sourceText=0x2169ef8 "select * from ridt order by r1;", param*=0x0, numParams=0,
queryEnv=0x0) at analyze.c:116
#8 0x000000000077f77c in pg_analyze_and_rewrite (parsetree=parsetree@entry=0x216ad78,
query_string=query_string@entry=0x2169ef8 "select * from ridt order by r1;",
param*=param*@entry=0x0, numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0)
at postgres.c:691
#9 0x000000000077fc4c in exec_simple_query (
query_string=0x2169ef8 "select * from ridt order by r1;") at postgres.c:1155
#10 0x00000000007810be in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2194f90,
dbname=0x2194eb8 "postgres", username=<optimized out>) at postgres.c:4298
#11 0x0000000000480ec6 in BackendRun (port=<optimized out>, port=<optimized out>)
at postmaster.c:4510
#12 BackendStartup (port=0x218ce90) at postmaster.c:4202
#13 ServerLoop () at postmaster.c:1727
#14 0x000000000070b92e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2164ac0)
at postmaster.c:1400
#15 0x0000000000481bff in main (argc=3, argv=0x2164ac0) at main.c:210
(gdb)
PostgreSQL会在函数get_sort_group_operators内进行操作符查找:
/* get_sort_group_operators代码片段 */
typentry = lookup_type_cache(argtype, cache_flags);
首先,函数经过lookup_type_cache,这个函数会查找pg_opclass,而这个表依赖于pg_opfamily的信息,还会查找pg_amop,pg_amop依赖于pg_operator的定义。而pg_operator又依赖于pg_proc的函数定义。
/* get_sort_group_operators代码片段 */
lt_opr = typentry->lt_opr;
eq_opr = typentry->eq_opr;
gt_opr = typentry->gt_opr;
以上表示需要三种操作符,lt,eq,gt,以及操作符对应的三种函数。
这就需要在newtype.c中添加函数体,以及在pg_proc中添加函数定义,在pg_operator添加操作符定义。
那么添加完以上代码后,会发现:
postgres=# select * from testnewtype where t1 < '2' order by t1;
2020-04-13 22:36:04.644 CST [31149] ERROR: XX000: missing support function 1(564,564) in opfamily 567
2020-04-13 22:36:04.644 CST [31149] LOCATION: FinishSortSupportFunction, sortsupport.c:119
2020-04-13 22:36:04.644 CST [31149] STATEMENT: select * from testnewtype where t1 < '2' order by t1;
ERROR: missing support function 1(564,564) in opfamily 567
postgres=#
首先根据定义找到函数栈:
(gdb) bt
#0 FinishSortSupportFunction (opfamily=567, opcintype=564, ssup=0x206a9c8) at sortsupport.c:95
#1 0x000000000089dc9d in PrepareSortSupportFromOrderingOp (orderingOp=570, ssup=0x206a9c8) at sortsupport.c:149
#2 0x00000000008a35e9 in tuplesort_begin_heap (tupDesc=<optimized out>, nkeys=1, attNums=0x20b3a40,
sortOperators=0x20b3a58, sortCollations=0x20b3a70, nullsFirstFlags=0x20b3a88, workMem=4096,
coordinate=coordinate@entry=0x0, randomAccess=false) at tuplesort.c:862
#3 0x00000000006429b4 in ExecSort (pstate=0x20447a0) at nodeSort.c:89
#4 0x0000000000618322 in ExecProcNode (node=0x20447a0) at ../../../src/include/executor/executor.h:245
#5 ExecutePlan (execute_once=<optimized out>, dest=0x20b3f40, direction=<optimized out>, numberTuples=0,
sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x20447a0, estate=0x2044578)
at execMain.c:1646
#6 standard_ExecutorRun (queryDesc=0x200ef28, direction=<optimized out>, count=0, execute_once=<optimized out>)
at execMain.c:364
#7 0x000000000076838b in PortalRunSelect (portal=portal@entry=0x1fbbec8, forward=forward@entry=true, count=0,
count@entry=9223372036854775807, dest=dest@entry=0x20b3f40) at pquery.c:912
#8 0x00000000007695f8 in PortalRun (portal=portal@entry=0x1fbbec8, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x20b3f40,
altdest=altdest@entry=0x20b3f40, qc=qc@entry=0x7ffd12185180) at pquery.c:756
#9 0x000000000076532c in exec_simple_query (query_string=0x1f54ef8 "select * from newt order by t1;")
at postgres.c:1239
#10 0x00000000007666b7 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1f7ffa0,
dbname=0x1f7fec8 "postgres", username=<optimized out>) at postgres.c:4315
---Type <return> to continue, or q <return> to quit---
#11 0x00000000004815bb in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4510
#12 BackendStartup (port=0x1f77ea0) at postmaster.c:4202
#13 ServerLoop () at postmaster.c:1727
#14 0x00000000006f4533 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1f4fac0) at postmaster.c:1400
#15 0x00000000004821fe in main (argc=3, argv=0x1f4fac0) at main.c:210
(gdb)
在函数FinishSortSupportFunction进行报错的。
而引起问题的部分代码为:
static void
FinishSortSupportFunction(Oid opfamily, Oid opcintype, SortSupport ssup)
{
Oid sortSupportFunction;
/* Look for a sort support function */
sortSupportFunction = get_opfamily_proc(opfamily, opcintype, opcintype,
} BTSORTSUPPORT_PROC);
get_opfamily_proc的代码是:
Oid
get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum)
{
HeapTuple tp;
Form_pg_amproc amproc_tup;
RegProcedure result;
tp = SearchSysCache4(AMPROCNUM,
ObjectIdGetDatum(opfamily),
ObjectIdGetDatum(lefttype),
ObjectIdGetDatum(righttype),
Int16GetDatum(procnum));
if (!HeapTupleIsValid(tp))
return InvalidOid;
amproc_tup = (Form_pg_amproc) GETSTRUCT(tp);
result = amproc_tup->amproc;
ReleaseSysCache(tp);
return result;
} BTSORTSUPPORT_PROC);
这里能看到会在系统表pg_amproc查找数据,在ctid排序时能看到查找的是bttidcmp,那么就需要增加相关函数以及定义。
5. 完成以上步骤后的演示结果:
postgres=# create table testnewtype(t1 newtype);
CREATE TABLE
postgres=# insert into testnewtype values ('1'),('2'),('3'),('4'),('5'),('5'),('6');
INSERT 0 7
postgres=# select * from testnewtype where t1 < '6' order by t1;
t1
----
1
2
3
4
5
5
(6 rows)
postgres=#
6. 总结[3]
- 在这里一共添加了一个文件,src/backend/utils/adt/newtype.c;
- 修改系统表pg_proc,增加基础类型函数定义;
- 修改系统表pg_type,增加基础类型定义;
- 增加比较函数体;
- 增加比较函数定义;
- 在系统表pg_opfamily增加操作符族;
- 在pg_opcalss增加操作符类;
- 在pg_operator增加操作符,便于使用where查询;
- 在pg_amop增加与访问方法操作符族相关的操作符信息;
- 增加cmp函数体;
- 在pg_amproc增加关于访问方法操作符族相关的支持函数btnewtype定义。
参考
- ^复制的int8类型的相关函数
- ^routines to support running postgres in 'bootstrap' mode bootstrap mode is used to create the initial template database
- ^具体代码可以参看我的github https://github.com/wangguoke/postgres/commit/cb5ae3ca38965cb55ed780b46043d30927cebfbe