Effective Query Size Estimation Using Neural Networks

This paper describes a novel approach to estimate the size of database query results using neural networks. Using the proposed approach, three layer neural networks are constructed and trained to learn the cumulative distribution functions of attribute values in relations. With a trained network, the estimation of the query result size could be obtained instantly by simply computing the network output from the given query predicates. The basic computational model using a cumulative distribution function to compute the query result size is described. The network construction and training is discussed. Comprehensive experiments were conducted to study the effectiveness of the proposed approach. The results indicate that the approach produces estimates with accuracies that are comparable with or higher than those reported in the literature.

[1]  David F. Shanno,et al.  Remark on “Algorithm 500: Minimization of Unconstrained Multivariate Functions [E4]” , 1980, TOMS.

[2]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[3]  Naphtali Rishe,et al.  An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment , 1993, SIGMOD '93.

[4]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[6]  Stavros Christodoulakis,et al.  Estimating block transfers and join sizes , 1983, SIGMOD '83.

[7]  Wen-Chi Hou,et al.  Statistical estimators for aggregate relational algebra queries , 1991, TODS.

[8]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[9]  Anne H. H. Ngu,et al.  Query Size Estimation Using Machine Learning , 1997, DASFAA.

[10]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[11]  David F. Shanno,et al.  Algorithm 500: Minimization of Unconstrained Multivariate Functions [E4] , 1976, TOMS.

[12]  Wen-Chi Hou,et al.  Statistical estimators for relational algebra expressions , 1988, PODS '88.

[13]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[14]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[15]  Jane Fedorowicz Database evaluation using multiple regression techniques , 1984, SIGMOD '84.

[16]  Peter J. Haas,et al.  Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.

[17]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[18]  Clifford A. Lynch,et al.  Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values , 1988, VLDB.