Exploratory Mining in Cube Space

Data Mining has evolved as a new discipline at the intersection of several existing areas, including Database Systems, Machine Learning, Optimization, and Statistics. An important question is whether the field has matured to the point where it has originated substantial new problems and techniques that distinguish it from its parent disciplines. In this paper, we discuss a class of new problems and techniques that show great promise for exploratory mining, while synthesizing and generalizing ideas from the parent disciplines. While the class of problems we discuss is broad, there is a common underlying objective-to look beyond a single data mining step (e.g., data summarization or model construction) and address the combined process of data selection and transformation, parameter and algorithm selection, and model construction. The fundamental difficulty lies in the large space of alternative choices at each step, and good solutions must provide a natural framework for managing this complexity. We regard this as a grand challenge for Data Mining, and see the ideas in this paper as promising initial steps towards a rigorous exploratory framework that supports the entire process. This is joint work with several people, in particular, Beechung Chen.

[1]  Surajit Chaudhuri,et al.  Example-driven design of efficient record matching queries , 2007, VLDB.

[2]  Vinod Yegneswaran,et al.  Composite subset measures , 2006, VLDB.

[3]  Raghu Ramakrishnan,et al.  Adversarial-knowledge dimensions in data privacy , 2008, The VLDB Journal.

[4]  Karl R. Gegenfurtner,et al.  Perception Based Image Retrieval , .

[5]  Raghu Ramakrishnan,et al.  Exploratory mining in cube space , 2006, Data Mining and Knowledge Discovery.

[6]  Raghu Ramakrishnan,et al.  Cube-space data mining , 2008 .

[7]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[8]  Vinod Yegneswaran,et al.  Toward a Query Language for Network Attack Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[9]  Bee-Chung Chen,et al.  Multidimensional interactive fine-grained image retrieval , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Raghu Ramakrishnan,et al.  Bellwether analysis: predicting global aggregates from local regions , 2006, VLDB.

[11]  David R. Musicant,et al.  Learning from Aggregate Views , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Bee-Chung Chen,et al.  A Logical Framework of Knowledge Retrieval with Fuzziness , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[13]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.