DensEst: Density Estimation for Data Mining in High Dimensional Spaces

Subspace clustering and frequent itemset mining via “stepby-step” algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large high dimensional data bases. Recent “jump” algorithms directly choose candidate subspace regions or patterns. Their scalability and quality depend heavily on the rating of these candidates as mislead jumps incur poor results and costly candidate refinements. Existing techniques rely on simple statistics with low estimation quality or on inefficient

[1]  Ira Assent,et al.  EDSC: efficient density-based subspace clustering , 2008, CIKM '08.

[2]  Albrecht Zimmermann,et al.  The Chosen Few: On Identifying Valuable Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Ira Assent,et al.  DUSC: Dimensionality Unbiased Subspace Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[5]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[8]  Nick Koudas,et al.  Entropy based approximate querying and exploration of datacubes , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[9]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Peter J. Haas,et al.  Consistently Estimating the Selectivity of Conjuncts of Predicates , 2005, VLDB.

[12]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[13]  Philip S. Yu,et al.  Mining Colossal Frequent Patterns by Core Pattern Fusion , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Martin Ester,et al.  P3C: A Robust Projected Clustering Algorithm , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.