Region-based online promotion analysis

This paper addresses a fundamental and challenging problem with broad applications: efficient processing of region-based promotion queries, i.e., to discover the top-k most interesting regions for effective promotion of an object (e.g., a product or a person) given by user, where a region is defined over continuous ranged dimensions. In our problem context, the object can be promoted in a region when it is top-ranked in it. Such type of promotion queries involves an exponentially large search space and expensive aggregation operations. For efficient query processing, we study a fresh, principled framework called region-based promotion cube (RepCube). Grounded on a solid cost analysis, we first develop a partial materialization strategy to yield the provably maximum online pruning power given a storage budget. Then, cell relaxation is performed to further reduce the storage space while ensuring the effectiveness of pruning using a given bound. Extensive experiments conducted on large data sets show that our proposed method is highly practical, and its efficiency is one to two orders of magnitude higher than baseline solutions.

[1]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[2]  Christos Doulkeridis,et al.  Reverse top-k queries , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[4]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[5]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[6]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[7]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[8]  Raymond Chi-Wing Wong,et al.  Creating Competitive Products , 2009, Proc. VLDB Endow..

[9]  Jiawei Han,et al.  ARCube: supporting ranking aggregate queries in partially materialized data cubes , 2008, SIGMOD Conference.

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[12]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[13]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[14]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[15]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  David Wai-Lok Cheung,et al.  OLAP on sequence data , 2008, SIGMOD Conference.

[17]  Eric Lo,et al.  Supporting ranking pattern-based aggregate queries in sequence data cubes , 2009, CIKM.

[18]  Raghu Ramakrishnan Exploratory Mining in Cube Space , 2006, Sixth International Conference on Data Mining (ICDM'06).