绑定完请刷新页面
取消
刷新

分享好友

×
取消 复制
Postgresql fdw原理及postgres_fdw使用
2020-06-17 17:06:57

------- 原理 ---------

excellent reference:《Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw》


访问外部数据有两种方法(postgresql采用foreign table)

Foreign Table : this is about how to access external data sources and present them as relational tables.

Datalink : this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer a file.

Why do we need FDW

不同信息系统往往需要数据库中的同一部分数据。比较笨的办法是拷贝多份表,比较smart的方法是FDW。

FDW机制是postgresql的一个亮点。


------- postgres_fdw brief introduction --------

postgres_fdw是postgresql官方推出的,可以为其他fdw设计的参考实例

other fdw extension都是文件名xxxx_fdw.


------- Installation postgres_fdw --------

environment: PostgreSQL 10, Red Hat/CentOS:

# yum install postgresql10-contrib.x86_64

two PostgreSQL Instances: a source instance and a destination instance


------- how to use postgres_fdw --------


Step 1 : Create a user on the source

,this user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';


Step 2 : Create test tables in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));

postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');


Step 3 : Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;


Step 4 : Modify ACL in pg_hba.conf


We need to ensure that the proper authentication is setup for accessing source server from destination server.

Add an entry into pg_hba.conf as shown below, preferably at the beginning of the file.

host all all destination_server_ip/32 md5


Step 5 : Test connectivity and privileges on source


Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).


In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.


Step 6 : Create postgres_fdw extension on the destination


Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

postgres=# create extension postgres_fdw;

Validate if the extension is created using \dx. Following is an example validation log.

postgres=# \dx postgres_fdw


Step 7: Grant privileges to user in the destination


Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;


Step 8: Create a server definition


Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr

FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (dbname 'postgres', host 'hr', port '5432');


Step 9: Create user mapping from destination user to source user


Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)


postgres=> CREATE USER MAPPING for app_user

SERVER hr

OPTIONS (user 'fdw_user', password 'secret');


Step 10 : Create foreign table definition on the destination


Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name


postgres=# CREATE FOREIGN TABLE employee

(id int, first_name character varying(20), last_name character varying(20))

SERVER hr

OPTIONS (schema_name 'public', table_name 'employee');


Step 11 : Test foreign table


Validate whether we can query the foreign table we just created in the destination server.


postgres=> select * from employee;

As we can see from the above example, data is been accessed from the source database.


Writable foreign tables

At the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:


------- creating foreign tables 自动化 --------

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.


On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).


---------- Advantages of foreign tables ----------


foreign tables V.S. remote SQL

(1)need not duplication(完全一样) or replicating(外形一样),不需要同步维护

(2)the biggest advantage is performance optimization

(3)统一接口


performance体现在the below aspects:

(1)query optimization

(2)Operator and function pushdown


PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:


SELECT avg(sal)

FROM EMP

WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);


(3)Join push down


In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:



postgres=# EXPLAIN VERBOSE SELECT COUNT(*)

FROM EMP JOIN DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;


(4)Predicate push down


There are two options when executing a query against a foreign table:


Fetch the data locally and apply the predicates like filtering condition locally.

Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.


If you consider the previous example, we can see that the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:


Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))


PostgreSQL not only pushed the predicate, it also rewrote the query we sent to avoid one extra AND condition.


(5)Aggregate push down


Just like predicate push down, here PostgreSQL also considers 2 options:


Execute the aggregates on the remote server and pull the result back to the local server

Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown, since we’ve used an aggregate function for that example. Here’s another simple example:


PgSQL


postgres=# explain verbose select deptno,count(*) from emp group by deptno;



In this case, all of the aggregate calculation happens on the remote server.


(6)Triggers and Check constraints on Foreign tables


We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.

戳我,来吐槽~