Postgresql fdw原理及postgres_fdw使用

2020-06-17 17:06:57

------- 原理 ---------

excellent reference：《Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw》

访问外部数据有两种方法（postgresql采用foreign table）

Foreign Table : this is about how to access external data sources and present them as relational tables.

Datalink : this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer a file.

Why do we need FDW

不同信息系统往往需要数据库中的同一部分数据。比较笨的办法是拷贝多份表，比较smart的方法是FDW。

FDW机制是postgresql的一个亮点。

------- postgres_fdw brief introduction --------

postgres_fdw是postgresql官方推出的，可以为其他fdw设计的参考实例

other fdw extension都是文件名xxxx_fdw.

------- Installation postgres_fdw --------

environment: PostgreSQL 10, Red Hat/CentOS:

# yum install postgresql10-contrib.x86_64

two PostgreSQL Instances： a source instance and a destination instance

------- how to use postgres_fdw --------

Step 1 : Create a user on the source

，this user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

Step 2 : Create test tables in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));

postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');

Step 3 : Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;

Step 4 : Modify ACL in pg_hba.conf

We need to ensure that the proper authentication is setup for accessing source server from destination server.

Add an entry into pg_hba.conf as shown below, preferably at the beginning of the file.

host all all destination_server_ip/32 md5

Step 5 : Test connectivity and privileges on source

Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).

In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.

Step 6 : Create postgres_fdw extension on the destination

Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

postgres=# create extension postgres_fdw;

Validate if the extension is created using \dx. Following is an example validation log.

postgres=# \dx postgres_fdw

Step 7: Grant privileges to user in the destination

Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;

Step 8: Create a server definition

Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr

FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (dbname 'postgres', host 'hr', port '5432');

Step 9: Create user mapping from destination user to source user

Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)

postgres=> CREATE USER MAPPING for app_user

SERVER hr

OPTIONS (user 'fdw_user', password 'secret');

Step 10 : Create foreign table definition on the destination

Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name

postgres=# CREATE FOREIGN TABLE employee

(id int, first_name character varying(20), last_name character varying(20))

SERVER hr

OPTIONS (schema_name 'public', table_name 'employee');

Step 11 : Test foreign table

Validate whether we can query the foreign table we just created in the destination server.

postgres=> select * from employee;

As we can see from the above example, data is been accessed from the source database.

Writable foreign tables

At the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:

------- creating foreign tables 自动化 --------

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.

On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).

---------- Advantages of foreign tables ----------

foreign tables V.S. remote SQL

（1）need not duplication(完全一样） or replicating（外形一样），不需要同步维护

（2）the biggest advantage is performance optimization

（3）统一接口

performance体现在the below aspects:

（1）query optimization

（2）Operator and function pushdown

PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:

SELECT avg(sal)

FROM EMP

WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);

（3）Join push down

In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:

postgres=# EXPLAIN VERBOSE SELECT COUNT(*)

FROM EMP JOIN DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

（4）Predicate push down

There are two options when executing a query against a foreign table:

Fetch the data locally and apply the predicates like filtering condition locally.

Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.

If you consider the previous example, we can see that the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

PostgreSQL not only pushed the predicate, it also rewrote the query we sent to avoid one extra AND condition.

（5）Aggregate push down

Just like predicate push down, here PostgreSQL also considers 2 options:

Execute the aggregates on the remote server and pull the result back to the local server

Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown, since we’ve used an aggregate function for that example. Here’s another simple example:

PgSQL

postgres=# explain verbose select deptno,count(*) from emp group by deptno;

In this case, all of the aggregate calculation happens on the remote server.

（6）Triggers and Check constraints on Foreign tables

We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.

分享好友

分享这个小栈给你的朋友们，一起进步吧。

PostgreSQL

创建时间：2020-06-17 14:30:20

PostgreSQL是一种特性非常齐全的自由软件的对象-关系型数据库管理系统（ORDBMS），是以加州大学计算机系开发的POSTGRES，4.2版本为基础的对象关系型数据库管理系统。POSTGRES的许多领先概念只是在比较迟的时候才出现在商业网站数据库中。PostgreSQL支持大部分的SQL标准并且提供了很多其他现代特性，如复杂查询、外键、触发器、视图、事务完整性、多版本并发控制等。同样，PostgreSQL也可以用许多方法扩展，例如通过增加新的数据类型、函数、操作符、聚集函数、索引方法、过程语言等。另外，因为许可证的灵活，任何人都可以以任何目的免费使用、修改和分发PostgreSQL。

展开

订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅：虚拟交易，一经交易不退款；若特殊情况，可3日内客服咨询

• 专区发布评论属默认订阅所评论专区（除付费小栈外）

技术专家

查看更多

小雨滴
专家