Efficient Bidirectional Order Dependency Discovery

Bidirectional order dependencies state relationships of order between lists of attributes. They naturally model the order-by clauses in SQL queries, and are proved effective in query optimizations concerning sorting. Despite their importance, order dependencies on a dataset are typically unknown and are too costly, if not impossible, to design or discover manually. Techniques for automatic order dependency discovery are recently studied. It is challenging for order dependency discovery to scale well, since it is by nature factorial in the number m of attributes and quadratic in the number n of tuples. In this paper, we adopt a strategy that decouples the impact of m from that of n, and that still finds all minimal valid bidirectional order dependencies. We present carefully designed data structures, a host of algorithms and optimizations, for efficient order dependency discovery. With extensive experimental studies on both real-life and synthetic datasets, we verify our approach significantly outperforms state-of-the-art techniques, by orders of magnitude.

[1]  Divesh Srivastava,et al.  Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization , 2016, Proc. VLDB Endow..

[2]  Ihab F. Ilyas,et al.  Trends in Cleaning Relational Data: Consistency and Deduplication , 2015, Found. Trends Databases.

[3]  Divesh Srivastava,et al.  Effective and complete discovery of bidirectional order dependencies via set-based axioms , 2018, The VLDB Journal.

[4]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[6]  Calisto Zuzarte,et al.  Expressiveness and Complexity of Order Dependencies , 2013, Proc. VLDB Endow..

[7]  Calisto Zuzarte,et al.  Business-Intelligence Queries with Order Dependencies in DB2 , 2014, EDBT.

[8]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[9]  Felix Naumann,et al.  Efficient order dependency detection , 2015, The VLDB Journal.

[10]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[11]  Richard Hull,et al.  Order Dependency in the Relational Model , 1983, Theor. Comput. Sci..

[12]  Wenfei Fan,et al.  Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.

[13]  Felix Naumann,et al.  Efficient Denial Constraint Discovery with Hydra , 2017, Proc. VLDB Endow..

[14]  Richard Hull,et al.  Sort sets in the relational model , 1983, PODS '83.

[15]  Alberto Montresor,et al.  Discovering Order Dependencies through Order Compatibility , 2019, EDBT.

[16]  Felix Naumann,et al.  A Hybrid Approach to Functional Dependency Discovery , 2016, SIGMOD Conference.

[17]  Jef Wijsen,et al.  Trends in Databases: Reasoning and Mining , 2001, IEEE Trans. Knowl. Data Eng..

[18]  Jarek Gryz,et al.  Fundamentals of Order Dependencies , 2012, Proc. VLDB Endow..

[19]  Lei Chen,et al.  Differential dependencies: Reasoning and discovery , 2011, TODS.