Selectivity estimation of large multidimensional data warehouses using logical grid directory

We describe an implementation scheme for selectivity estimation using Multi Level Grid File (MLGF). The MLGF is a balanced, dynamic and hierarchical file structure that conforms to non uniform and correlated distribution. Our main goal is to develop a technique to determine the selectivity for a query from a large database where the grid directory is implemented logically without taking any physical storage. Using our implementation scheme, we compared the estimated selectivity and the storage requirement. We found low error rate for the estimated selectivity. We also estimate the overflow situation of a MLGF when the number of dimensions and length of a dimension is large. We found better results for our logical implementation when the over flow condition is concerned. We present extensive experimental results, validating our theoretical analysis and demonstrating the advantage of our technique when compared to complex selectivity estimation techniques using the Microsoft SQL Server.

[1]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  H. Buchner The Grid File : An Adaptable , Symmetric Multikey File Structure , 2001 .

[4]  Sherif Sakr,et al.  Towards a comprehensive assessment for selectivity estimation approaches of XML queries , 2010, Int. J. Web Eng. Technol..

[5]  K. M. Azharul Hasan,et al.  An Implementation Scheme for Multidimensional Extendable Array Operations and Its Evaluation , 2011 .

[6]  Tatsuo Tsuji,et al.  An extendible data structure for handling large multidimensional data sets , 2009, 2009 12th International Conference on Computers and Information Technology.

[7]  Sinisa Ilic,et al.  Using Wavelet Packets for Selectivity Estimation , 2013, Comput. J..

[8]  M. Seetha Lakshmi,et al.  Selectivity Estimation in Extensible Databases - A Neural Network Approach , 1998, VLDB.

[9]  Ravi Krishnamurthy,et al.  The Multilevel Grid File - A Dynamic Hierarchical Multidimensional File Structure , 1991, DASFAA.

[10]  K. M. Azharul Hasan,et al.  An Efficient Encoding Scheme to Handle the Address Space Overflow for Large Multidimensional Arrays , 2013, J. Comput..

[11]  Chen Li,et al.  Selectivity Estimation for Fuzzy String Predicates in Large Data Sets , 2005, VLDB.

[12]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[13]  Eli Upfal,et al.  The VC-Dimension of SQL Queries and Selectivity Estimation through Sampling , 2011, ECML/PKDD.

[14]  Dimitrios Gunopulos,et al.  Selectivity estimators for multidimensional range queries over real attributes , 2005, The VLDB Journal.

[15]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Daisy Zhe Wang,et al.  Selectivity estimation for extraction operators over text data , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Pedro Furtado,et al.  Summary grids: building accurate multidimensional histograms , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[19]  Christian Böhm,et al.  Selectivity Estimation of High Dimensional Window Queries via Clustering , 2005, SSTD.

[20]  Gio Wiederhold,et al.  Dynamic maintenance of data distribution for selectivity estimation , 2005, The VLDB Journal.