Query optimization in microsoft SQL server PDW

In recent years, Massively Parallel Processors have increasingly been used to manage and query vast amounts of data. Dramatic performance improvements are achieved through distributed execution of queries across many nodes. Query optimization for such system is a challenging and important problem. In this paper we describe the Query Optimizer inside the SQL Server Parallel Data Warehouse product (PDW QO). We leverage existing QO technology in Microsoft SQL Server to implement a cost-based optimizer for distributed query execution. By properly abstracting metadata we can readily reuse existing logic for query simplification, space exploration and cardinality estimation. Unlike earlier approaches that simply parallelize the best serial plan, our optimizer considers a rich space of execution alternatives, and picks one based on a cost-model for the distributed execution environment. The result is a high-quality, effective query optimizer for distributed query processing in an MPP.

[1]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[2]  Goetz Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[3]  Jayant R. Haritsa,et al.  The Picasso database query optimizer visualizer , 2010, Proc. VLDB Endow..

[4]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[5]  Leo Giakoumakis,et al.  Unit-testing query transformation rules , 2008, DBTest '08.

[6]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[7]  Nicolas Bruno,et al.  Automated partitioning design in parallel database systems , 2011, SIGMOD '11.