对于DBA而言,索引是数据库调优的常用手段,合理的使用索引可以加速SQL语句的执行速度,帮助数据库更加高效的利用系统资源。同时,当前数据库业务场景的复杂多变性对索引提出了更高的要求。
为了支撑这些复杂多变的应用场景,Kingbase ES提供了6种类型的索引,每种类型的索引对应不同的算法实现,下文将一一介绍其功能及应用场景。
Kingbase ES六种索引方式及适用场景
01
BTREE
BTREE是Kingbase ES的默认索引,采用B+树( N叉排序树)来做实现,由于树状结构每一层节点都有序列,因此非常适合用来做范围查询和优化排序操作。BTREE索引支持的操作符有 >, <, >=, <=, =, IN, LIKE等,同时,优化器也会优先选择BTREE来对ORDER BY、MIN、MAX、MERGE JOIN进行有序操作。
TEST=# create table t1 (id int, info text);
CREATE TABLE
TEST=# insert into t1 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
TEST=# analyze;
ANALYZE
TEST=# explain analyze select * from t1 where id < 10; --无索引
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on T1 (cost=0.00..2084.00 rows=10 width=37) (actual time=0.024..20.771 rows=9 loops=1)
Filter: (ID < 10)
Rows Removed by Filter: 99991
Planning time: 0.216 ms
Execution time: 20.804 ms
(5 rows)
TEST=# create index i_btree on t1 using btree(id);
CREATE INDEX
TEST=# explain analyze select * from t1 where id < 10; --有btree索引
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Index Scan using I_BTREE on T1 (cost=0.29..8.45 rows=9 width=37) (actual time=0.006..0.011 rows=9 loops=1)
Index Cond: (ID < 10)
Planning time: 0.250 ms
Execution time: 0.036 ms
(4 rows)
TEST=# explain analyze select min(id) from t1; --BTREE索引优化聚集函数min
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.33..0.34 rows=1 width=4) (actual time=0.144..0.144 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.29..0.33 rows=1 width=4) (actual time=0.132..0.135 rows=1 loops=1)
-> Index Only Scan using I_BTREE on T1 (cost=0.29..3691.29 rows=100000 width=4) (actual time=0.129..0.130 rows=1 loops=1)
Index Cond: (ID IS NOT NULL)
Heap Fetches: 1
Planning time: 0.309 ms
Execution time: 0.221 ms
(8 rows)
02
HASH
HASH索引查询效率比BTREE高得多,相比BTREE索引需要从根节点查询到叶子节点,且需要多次IO及CPU操作,HASH索引因其索引结构的特殊性,理想情况下仅需一次检索即可定位到数据。
与此同时,HASH也存在一定的局限性,只适合做等值查询。由于HASH索引是通过比较哈希值来做定位,因此当查询列上有较大比例的重复值时,会带来严重的哈希冲突,进而影响查询速度,这种情况下就不建议使用HASH索引。
TEST=# create table t2 (id int, info text);
CREATE TABLE
TEST=# insert into t2 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
TEST=# analyze;
ANALYZE
TEST=# create index i_hash on t2 using hash(id);
WARNING: hash indexes are not WAL-logged and their use is discouraged
CREATE INDEX
TEST=# explain analyze select * from t2 where id = 10; --HASH索引
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using I_HASH on T2 (cost=0.00..8.02 rows=1 width=37) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (ID = 10)
Planning time: 0.167 ms
Execution time: 0.033 ms
(4 rows)
TEST=# explain analyze select * from t2 where id < 10; --HASH索引不支持非等值操作
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on T2 (cost=0.00..2084.00 rows=10 width=37) (actual time=0.015..26.223 rows=9 loops=1)
Filter: (ID < 10)
Rows Removed by Filter: 99991
Planning time: 0.070 ms
Execution time: 26.245 ms
(5 rows)
03
GIN
GIN是倒排索引,主要适用于包含多个组合值的查询,如数组、全文检索等。
TEST=# create table t3(id int, info text);
CREATE TABLE
TEST=# insert into t3 values(generate_series(1,10000), md5(random()::text));
INSERT 0 10000
TEST=# create index i_t3_gin on t3 using gin(to_tsvector('english', info));
CREATE INDEX
TEST=# analyze;
ANALYZE
TEST=# explain analyze select * from t3 where to_tsvector('english', info) @@ plainto_tsquery( 'hello'); --使用GIN索引的全文检索
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on T3 (cost=12.25..16.77 rows=1 width=37) (actual time=0.029..0.029 rows=0 loops=1)
Recheck Cond: (TO_TSVECTOR('ENGLISH'::REGCONFIG, INFO) @@ PLAINTO_TSQUERY('hello'::TEXT))
-> Bitmap Index Scan on I_T3_GIN (cost=0.00..12.25 rows=1 width=0) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: (TO_TSVECTOR('ENGLISH'::REGCONFIG, INFO) @@ PLAINTO_TSQUERY('hello'::TEXT))
Planning time: 0.460 ms
Execution time: 0.068 ms
(6 rows)
04
GIST
GIST意为通用搜索树,是一种平衡的、树状结构的访问方法。GIST索引适用于多维数据类型和集合数据类型,可以用来做位置搜索,如包含、相交、左边、右边等。
TEST=# create table t4(id int, po point, info text);
CREATE TABLE
TEST=# insert into t4 values(generate_series(1,100000), point(random()*10000, random()*10000), md5(random()::text));
INSERT 0 100000
TEST=# analyze;
ANALYZE
TEST=# create index i_t4_gist on t4 using gist(po);
CREATE INDEX
TEST=# explain analyze select * from t4 where po << point(100,100); --查询po列上在(100,100)左边的点
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on T4 (cost=437.78..1699.78 rows=10000 width=53) (actual time=0.469..1.361 rows=957 loops=1)
Recheck Cond: (PO << '(100,100)'::POINT)
Heap Blocks: exact=646
-> Bitmap Index Scan on I_T4_GIST (cost=0.00..435.28 rows=10000 width=0) (actual time=0.370..0.370 rows=957 loops=1)
Index Cond: (PO << '(100,100)'::POINT)
Planning time: 0.169 ms
Execution time: 1.443 ms
(7 rows)
05
BRIN
BRIN索引是块范围索引的简称,由于其存储了表的连续数据块区间以及对应的数据取值范围,因此BRIN索引的体积和维护代价相比其他索引低很多。
BRIN索引适用于存储流式的数据日志。例如:按照时间插入的数据。由于数据是按照时间插入,因此数据块上记录的范围信息很少会出现交叉情况,索引过滤后需要比较的数据块也会少很多;反之,如果数据交叉严重,通过索引无法过滤掉任何一个数据块时,操作起来会比全表扫描更加耗时。
TEST=# create table t5(id int, name text);
CREATE TABLE
TEST=# insert into t5 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
TEST=# analyze;
ANALYZE
TEST=# create index i5_brin on t5 using brin(id);
CREATE INDEX
TEST=# explain analyze select * from t5 where id < 10; --使用BRIN索引,索引仅能过滤到连续的数据块级别,需要对块内的数据做再次比较
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on T5 (cost=12.08..48.92 rows=10 width=37) (actual time=0.039..2.498 rows=9 loops=1)
Recheck Cond: (ID < 10)
Rows Removed by Index Recheck: 15351
Heap Blocks: lossy=128
-> Bitmap Index Scan on I5_BRIN (cost=0.00..12.07 rows=10 width=0) (actual time=0.028..0.028 rows=1280 loops=1)
Index Cond: (ID < 10)
Planning time: 0.178 ms
Execution time: 2.527 ms
(8 rows)
TEST=# create index i5_btree on t5 using btree(id);
CREATE INDEX
TEST=# \di+ --BRIN索引大小远小于BTREE
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+----------+-------+-------+-------+---------+-------------
PUBLIC | I5_BRIN | index | nimo | T5 | 48 kB |
PUBLIC | I5_BTREE | index | nimo | T5 | 2208 kB |
(2 rows)
06
Bloom
Bloom索引提供了一种基于布隆过滤器的索引访问方法,可以用来检测一个元素是否可能属于集合,主要适用于当表具有很多属性且查询可能会涉及其中任意组合时的情况。
TEST=# create table t6 (c1 int, c2 int, c3 int, c4 int);
CREATE TABLE
TEST=# insert into t6 values(generate_series(1,100000), random()*100000, random()*100000, random()*10000);
INSERT 0 100000
TEST=# analyze ;
ANALYZE
TEST=# create index i_t6_bloom on t6 using bloom(c1,c2,c3,c4);
CREATE INDEX
TEST=# explain analyze select * from t6 where c2 = 100 and c4 = 100; --任意列组合查询
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on T6 (cost=1792.00..1796.02 rows=1 width=16) (actual time=0.628..0.628 rows=0 loops=1)
Recheck Cond: ((C2 = 100) AND (C4 = 100))
Rows Removed by Index Recheck: 5
Heap Blocks: exact=5
-> Bitmap Index Scan on I_T6_BLOOM (cost=0.00..1792.00 rows=1 width=0) (actual time=0.609..0.609 rows=5 loops=1)
Index Cond: ((C2 = 100) AND (C4 = 100))
Planning time: 0.221 ms
Execution time: 0.657 ms
(8 rows)
TEST=# explain analyze select * from t6 where c3 = 100 and c4 = 100; --任意列组合查询
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on T6 (cost=1792.00..1796.02 rows=1 width=16) (actual time=0.650..0.650 rows=0 loops=1)
Recheck Cond: ((C3 = 100) AND (C4 = 100))
Rows Removed by Index Recheck: 2
Heap Blocks: exact=2
-> Bitmap Index Scan on I_T6_BLOOM (cost=0.00..1792.00 rows=1 width=0) (actual time=0.635..0.635 rows=2 loops=1)
Index Cond: ((C3 = 100) AND (C4 = 100))
Planning time: 0.096 ms
Execution time: 0.680 ms
(8 rows)
综上所述,Kingbase ES提供的六种索引方式都有其特征和相对的适用场景。在实际应用中,大家根据自己需求选择匹配索引类型即可。
来自:https://mp.weixin.qq.com/s/y22WGxfn2G722-HQ-X0OrQ