Optimizing Similar Scalar Subqueries for XML Processing in Microsoft SQL Server

XML is often used to represent objects that expose different sets of properties. This "property bag" scenario is a prominent use case for the XML support added to Microsoft SQL Server 2005. However, each property extraction in our initial implementation executed as a separate relational subquery. This was problematic since query performance became unacceptable even for small data sizes when returning an increasing number of properties. We addressed this problem by developing an interesting generalization of common subexpressions. This paper makes the following contributions: (1) it introduces an equivalence rewrite for relational query optimization to fold similar scalar subqueries. Several such subqueries are merged into a single equivalent multi-column subquery using both predicate disjunction and rowset pivoting. The rewrite operates at the logical operator level which makes it equally applicable to XML queries and SQL queries. (2) We explain how this optimization can be applied to the XML property bag scenario and how it has been implemented for the XML index in Microsoft SQL Server 2005. (3) An experimental investigation with Microsoft SQL Server 2005 studies the performance characteristics of the optimization. It shows that the optimization yields significant performance improvements - without limiting essential optimizer execution plan choices.