一、目的
二、方案
2.1 数据实时同步高可用架构(本文省略)
架构图如下(不做说明)
当主机中心由于不可抗拒因素或者硬件及认为因素不可用,备机完全接管主机。
2.2 具有延迟性灾备方案
延迟性灾备方案,其目的主要为可以允许部分数据丢失或者损坏,但不允许集群不可用。当生产系统发生故障或者由于不可逆因素(如自然灾害)导致无法对外提供业务时,需要备用系统完全接替生产系统,但对于生产系统上近同步或者加载的部分数据可以保证在短时间内完成数据的重新加载,因此需要构架一套灾备系统来保证未来服务在一定程度上的高可用性。其架构示意图如下:
此方案有一定的局限性,即多长时间进行同步,同步对生产系统性能影响大小,可靠性测试等,需要进一步验证。
三、方案实现
3.1 全量源表在线同步
通过gpcopy 实现全量原表在线同步复制,如下:
同步源表后结果在备机如下:
同步数据:
同步源表数据后结果在备机如下:
该方式无法实现增量数据同步。
同时,需要注意的是,如果使用该方式进行数据同步,对于源端表数据应该通过pg_dump 的方式进行导出,然后在备用节点服务器进行恢复。
3.2 使用pg_dump和gpcopy进行数据初始化
3.2.1 源端相关结构信息
导出数据库结构
gpadmin@mdw ~]$ mkdir -p data/pg_dump
[gpadmin@mdw ~]$ export OLD_PORT=5432
[gpadmin@mdw ~]$ export OLD_HOST=mdw
[gpadmin@mdw ~]$ export DBNAME=testdb
[gpadmin@mdw ~]$ cd data/pg_dump/
[gpadmin@mdw pg_dump]$ pg_dumpall -p $OLD_PORT -h $OLD_HOST -s -g --resource-queues -f $DBNAME.global.ddl
[gpadmin@mdw pg_dump]$ ls
testdb.global.ddl
[gpadmin@mdw pg_dump]$ pg_dump -p $OLD_PORT -h $OLD_HOST -s testdb -f $DBNAME.ddl
[gpadmin@mdw pg_dump]$ ls
testdb.ddl testdb.global.ddl
3.2.2 被用集群上恢复相关结构
在备用集群上恢复导出的数据库结构和表结构
[gpadmin@mdw pg_dump]$ psql -p $NEW_PORT -h $NEW_HOST -d postgres -c "CREATE DATABASE $DBNAME"
CREATE DATABASE
[gpadmin@mdw pg_dump]$ psql -p $NEW_PORT -h $NEW_HOST -d $DBNAME -f $DBNAME.global.ddl>/dev/null
[gpadmin@mdw pg_dump]$ psql -p $NEW_PORT -h $NEW_HOST -d $DBNAME -f $DBNAME.ddl>/dev/null
为了能够使数据同步更快,建议删除索引并在同步完数据后进行索引重建。
使用 gpcopy 同步数据
[gpadmin@mdw pg_dump]$ psql -p $NEW_PORT -h $NEW_HOST -d $DBNAME -f $DBNAME.ddl>/dev/null
testdb=# SELECT pg_size_pretty(pg_relation_size('tab_test'));
pg_size_pretty
----------------
13 GB
(1 row)
testdb=# SELECT pg_size_pretty(pg_database_size('testdb'));
pg_size_pretty
----------------
13 GB
(1 row)
[gpadmin@mdw pg_dump]$ time gpcopy --dbname testdb --dbname testdb --dest-host 10.10.20.80 --dest-port 5432 --truncate
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-Starting copy 2.3....
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-Copy appname: gpcopy_202104060158_bCGoHnyXgCL97wQaLjxaR9
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-'gpcopy' --dbname 'testdb,testdb' --dest-host '10.10.20.80' --dest-port '5432' --truncate
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-Initializing gpcopy
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-pg_dump (PostgreSQL) 8.3.23
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-pg_dumpall (PostgreSQL) 8.3.23
20210406:01:58:04 gpcopy:gpadmin:mdw:002481-[INFO]:-Source Cluster PostgreSQL GetVersion is: 8.3.23
20210406:01:58:09 gpcopy:gpadmin:mdw:002481-[INFO]:-Start copying database "testdb"
20210406:01:58:09 gpcopy:gpadmin:mdw:002481-[INFO]:-[Worker 3] Start copying table "testdb"."public"."tab_test" => "testdb"."public"."tab_test"
20210406:01:58:09 gpcopy:gpadmin:mdw:002481-[WARNING]:-Copy table "testdb"."public"."tab_test" through MASTER node
20210406:02:01:17 gpcopy:gpadmin:mdw:002481-[INFO]:-[Worker 3] [Progress: (/1) DBs, (1/1) tables done] Finished copying table "testdb"."public"."tab_test" => "testdb"."public"."tab_test"
20210406:02:01:17 gpcopy:gpadmin:mdw:002481-[INFO]:-[testdb] Finished copying database "Progress: (1/1) DBs, (1/1) tables done"
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:-------------------------------------------------
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:-Total elapsed time: 3m13.886532883s
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:-Total transferred data 7.9GB, transfer rate 157.3GB/h
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:-Copied 1 databases
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:- Database testdb: successfully copied 1 tables, skipped tables, failed tables
20210406:02:01:18 gpcopy:gpadmin:mdw:002481-[INFO]:-Copy completed successfully
real 3m14.352s
user 0m.027s
sys 0m.042s
使用 gpcopy 同步数据该13G数据大约需要3分14秒,数据传输速率视网络带宽因素影响(该测试环境网络带宽为1000Mbps/s)。
3.2.3 备用主机验证
[gpadmin@mdwbak ~]$ psql -d testdb
psql (8.3.23)
Type "help" for help.
testdb=# SELECT pg_size_pretty(pg_relation_size('tab_test'));
pg_size_pretty
----------------
13 GB
(1 row)
testdb=# SELECT pg_size_pretty(pg_database_size('testdb'));
pg_size_pretty
----------------
13 GB
(1 row)
testdb=# \df f_test
List of functions
Schema | Name | Result data type | Argument data * | Type
--------+--------+------------------+---------------------+--------
public | f_test | integer | integer, integer | normal
(1 row)
3.2.4 使用pg_dump和psql恢复数据
源数据导出
[gpadmin@mdw pg_dump]$ time pg_dump -U gpadmin testdb -a -f testdb.sql
real 0m44.164s
user 0m.640s
sys 0m7.312s
[gpadmin@mdw pg_dump]$ ls -lh testdb.sql
-rw-rw-r-- 1 gpadmin gpadmin 8.0G Apr 6 02:23 testdb.sql
耗时44秒
备用主机数据导入
由于磁盘使用SSD,写入和读取性能测试为2G/s以上,而网络带宽为10GE带宽,为了提高数据写入性能,建议将数据文件拷贝到备用节点,在备用节点使用 psql 恢复数据
[gpadmin@mdw pg_dump]$ ls -lh testdb.sql
-rw-rw-r-- 1 gpadmin gpadmin 8.0G Apr 6 02:23 testdb.sql
[gpadmin@mdwbak pg_dump]$ time psql -U gpadmin -d testdb <testdb.sql
SET
SET
SET
SET
SET
SET
SET
real 4m49.873s
user 0m13.146s
sys 0m6.634s
[gpadmin@mdwbak pg_dump]$ psql -d testdb
psql (8.3.23)
Type "help" for help.
testdb=# SELECT pg_size_pretty(pg_relation_size('tab_test'));
pg_size_pretty
----------------
13 GB
(1 row)
testdb=# SELECT pg_size_pretty(pg_database_size('testdb'));
pg_size_pretty
----------------
13 GB
(1 row)
可以看到,即使将数据文件直接拷贝在备用主机上使用 psql 进行恢复,需要4分49秒,实际恢复速度视磁盘性能而定。
以上两种方式都可以进行不同集群之间的数据初始化,但是无法实现未来数据的增量同步。
3.2.5 使用 gptransfer进行数据同步
[gpadmin@mdw ~]$ psql -d testdb
psql (8.3.23)
Type "help" for help.
testdb=# SELECT pg_size_pretty(pg_database_size('testdb'));
pg_size_pretty
----------------
14 GB
(1 row)
[gpadmin@mdw ~]$ cat src-map-file
mdw,10.10.20.30
smdw,10.10.20.31
sdw1,10.10.20.32
sdw2,10.10.20.33
sdw3,10.10.20.34
sdw4,10.10.20.35
[gpadmin@mdw ~]$ time gptransfer -d testdb --source-host=mdw --source-port=5432 --source-user=gpadmin --dest-host=10.10.20.80 --dest-port=5432 --dest-user=gpadmin --dest-database=testdb --source-map-file src-map-file --drop
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Starting gptransfer with args: -d testdb --source-host=mdw --source-port=5432 --source-user=gpadmin --dest-host=10.10.20.80 --dest-port=5432 --dest-user=gpadmin --dest-database=testdb --source-map-file src-map-file --drop
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Validating options...
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Retrieving configuration of source Greenplum Database...
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Retrieving configuration of destination Greenplum Database...
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Retrieving source tables...
20210406:03:10:02:011954 gptransfer:mdw:gpadmin-[INFO]:-Checking for gptransfer schemas...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Retrieving list of destination tables...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Reading source host map file...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Building list of source tables to transfer...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Number of tables to transfer: 1
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-gptransfer will use "fast" mode for transfer.
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Validating source host map...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Validating transfer table set...
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Using batch size of 2
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Using sub-batch size of 24
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Creating work directory '/home/gpadmin/gptransfer_11954'
20210406:03:10:03:011954 gptransfer:mdw:gpadmin-[INFO]:-Creating schema public in database testdb...
20210406:03:10:04:011954 gptransfer:mdw:gpadmin-[INFO]:-Starting transfer of testdb.public.tab_test to testdb.public.tab_test...
20210406:03:10:04:011954 gptransfer:mdw:gpadmin-[INFO]:-Creating target table testdb.public.tab_test...
20210406:03:10:04:011954 gptransfer:mdw:gpadmin-[INFO]:-Retrieving schema for table testdb.public.tab_test...
20210406:03:10:06:011954 gptransfer:mdw:gpadmin-[INFO]:-Transfering data testdb.public.tab_test -> testdb.public.tab_test...
20210406:03:14:30:011954 gptransfer:mdw:gpadmin-[INFO]:-Finished transferring table testdb.public.tab_test, remaining of 1 tables
20210406:03:14:30:011954 gptransfer:mdw:gpadmin-[INFO]:-Running final table row count validation on destination tables...
20210406:03:14:48:011954 gptransfer:mdw:gpadmin-[INFO]:-Validation of testdb.public.tab_test successful
20210406:03:14:48:011954 gptransfer:mdw:gpadmin-[INFO]:-Removing work directories...
20210406:03:14:49:011954 gptransfer:mdw:gpadmin-[INFO]:-Finished.
real 4m47.655s
user 0m.968s
sys 0m.737s
备用主机查看数据信息
[gpadmin@mdwbak ~]$ psql -d testdb
psql (8.3.23)
Type "help" for help.
testdb=# SELECT pg_size_pretty(pg_database_size('testdb'));
pg_size_pretty
----------------
14 GB
(1 row)
四、结语