论文信息 - Approximating Representations for Large Numerical Databases

Approximating Representations for Large Numerical Databases

The paper introduces a notion of support for realvalued functions. It is shown how to approximate supports of a large class of functions based on supports of so called polynomial itemsets, which can efficiently be mined using an Apriori-style algorithm. An upper bound for the error of such an approximation can be reliably computed. The concept of an approximating representation was introduced, which extends the idea of concise representations to numerical data. It has been shown that many standard statistical modelling tasks such as nonlinear regression and least squares curve fitting can efficiently be solved using only the approximating representation, without accessing the original data at all. Since many of those methods traditionally require several passes over the data, our approach makes it possible to use such methods with huge datasets and data streams where several repeated scans are very costly or outright impossible.

Szymon Jaroszewicz | Marcin Korzen

[1] Heikki Mannila,et al. Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[2] Szymon Jaroszewicz,et al. Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[3] Ramakrishnan Srikant,et al. Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[4] Toon Calders,et al. Depth-First Non-Derivable Itemset Mining , 2005, SDM.

[5] Szymon Jaroszewicz,et al. Polynomial association rules with applications to logistic regression , 2006, KDD '06.

[6] Karen Saxe,et al. Beginning Functional Analysis , 2001 .

[7] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8] Hui Xiong,et al. Generalizing the notion of support , 2004, KDD.

[9] Marzena Kryszkiewicz. Concise representation of frequent patterns based on disjunction-free generators , 2001, Proceedings 2001 IEEE International Conference on Data Mining.