Subspace Discovery for Promotion: A Cell Clustering Approach

The promotion analysis problem has been proposed in , where ranking-based promotion query processing techniques are studied to effectively and efficiently promote a given object, such as a product, by exploring ranked answers. To be more specific, in a multidimensional data set, our goal is to discover interesting subspaces in which the object is ranked high. In this paper, we extend the previously proposed promotion cube techniques and develop a cell clustering approach that is able to further achieve better tradeoff between offline materialization and online query processing. We formally formulate our problem and present a solution to it. Our empirical evaluation on both synthetic and real data sets show that the proposed technique can greatly speedup query processing with respect to baseline implementations.

[1]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[2]  Wolfgang Maass,et al.  Approximation schemes for covering and packing problems in image processing and VLSI , 1985, JACM.

[3]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[4]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[5]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[6]  Madhav V. Marathe,et al.  Approximation Algorithms for Clustering to Minimize the Sum of Diameters , 2000, Nord. J. Comput..

[7]  Magnús M. Halldórsson,et al.  Algorithm Theory - SWAT 2000 , 2000 .

[8]  Rina Panigrahy,et al.  Clustering to minimize the sum of cluster diameters , 2001, STOC '01.

[9]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[10]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[11]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[12]  Jiawei Han,et al.  DataScope: Viewing Database Contents in Google Maps' Way , 2007, VLDB.

[13]  Theodore Johnson,et al.  Squashing flat files flatter , 1999, KDD '99.

[14]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[15]  Jiawei Han,et al.  ARCube: supporting ranking aggregate queries in partially materialized data cubes , 2008, SIGMOD Conference.

[16]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[17]  Esther M. Arkin,et al.  Algorithms for two-box covering , 2006, SCG '06.

[18]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[19]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[20]  J. Davenport Editor , 1960 .

[21]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[22]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[23]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.