Extending PostgreSQL to Support Distributed/Heterogeneous Query Processing

The evolution from relational DBMS to data integration system brings new challenges to the design and implementation of query execution engine that must be extended to support queries over multiple distributed, heterogeneous, and autonomous data sources. In this paper, we introduce our work on extending PostgreSQL to support distributed query processing. Although PostgreSQL has no built-in distributed query processor, its function mechanism provides possibilities for us to integrate data of various data sources and execute distributed queries. We point out several limitations in PostgreSQL's query engine and present corresponding query execution techniques to improve performance of distributed query processing. Our experimental results show that the techniques can significantly reduce response times when running a workload consisting of TPC-H queries.