A framework for promotion analysis in multi-dimensional space

Promotion is one of the most important elements in marketing. It is often desirable to find merit in an object (e.g., product, person, organization, or other business entity) and promote it in an appropriate community confidently. In this thesis, we motivate and discuss a novel class of data mining problems, called promotion analysis, for promoting a given object in a multi-dimensional space by leveraging object ranking information. The key observation is that most objects may not be highly ranked in the global space, where all objects are compared by all aspects; in contrast, there often exist interesting and meaningful local spaces in which the given object becomes prominent. Therefore, our general goal is to break down the data space and discover the most interesting local spaces in an effective and efficient way. We formally present the promotion analysis problem and formulate its variants and related notions. The promotion analysis problem is highly practical and useful in a wide spectrum of decision support applications. Typical application examples include merit discovery, product positioning and customer targeting, object profiling and summarization, identification of interesting features, and explorative search of objects. In fact, these applications are not new as they have been extensively studied and practiced in the marketing field. While existing commercial database and business intelligence systems can well support the functionality of retrieving the most highly ranked objects in some local space, there exists no multidimensional ranking analysis study for promotional purposes. Supporting effective and efficient online promotion analysis, nevertheless, presents many technical challenges, such as the spurious promotion problem, explosion of search space, and the high complexity of aggregation. Toward this end, we systematically study the problem and develop a general, principled promotion analysis framework. In terms of the search space, both subspaces formed on categorical dimensions and regions formed on continuous dimensions are examined. In terms of the object domain, both uniform object collection and multidimensional object space are studied. Moreover, we propose a unified query model to accommodate various scoring functions and redundancy-aware semantics. We also develop a statistical method to avoid spurious promotion results. For efficient query processing with a desirable balance between online and offline costs, we investigate exact algorithms as well as approximate algorithms with probabilistic guarantee. The promotion analysis framework not only provides an integrated solution for decision support applications, but also opens up new horizons for future research in other areas like information network analysis, text mining, and probabilistic data management.

[1]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[2]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[3]  Rina Panigrahy,et al.  Clustering to minimize the sum of cluster diameters , 2001, STOC '01.

[4]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[5]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[6]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[7]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[8]  Padhraic Smyth,et al.  Business applications of data mining , 2002, CACM.

[9]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[10]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[11]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[12]  Dimitrios Gunopulos,et al.  Ad-hoc Top-k Query Answering for Data Streams , 2007, VLDB.

[13]  Jian Pei,et al.  Mining Multi-Dimensional Constrained Gradients in Data Cubes , 2001, VLDB.

[14]  Jiawei Han,et al.  Top-K aggregation queries over large networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[16]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[17]  Nicole Immorlica,et al.  Dynamics of bid optimization in online advertisement auctions , 2007, WWW '07.

[18]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[19]  Hideki Asoh,et al.  A Context-Aware Movie Preference Model Using a Bayesian Network for Recommendation and Promotion , 2007, User Modeling.

[20]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[21]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[22]  Christian S. Jensen,et al.  Nearest and reverse nearest neighbor queries for moving objects , 2006, The VLDB Journal.

[23]  Jiawei Han,et al.  Flowcube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows , 2006, VLDB.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Jiawei Han,et al.  Association Mining in Large Databases: A Re-examination of Its Measures , 2007, PKDD.

[26]  Christos Doulkeridis,et al.  Reverse top-k queries , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[27]  S. Sudarshan,et al.  Ordering the attributes of query results , 2006, SIGMOD Conference.

[28]  Philip S. Yu,et al.  Graph OLAP: a multi-dimensional framework for graph data analysis , 2009, Knowledge and Information Systems.

[29]  Kevin Lane Keller,et al.  Marketing Management in China , 2010 .

[30]  Jiawei Han,et al.  Integrating OLAP and Ranking: The Ranking-Cube Methodology , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[31]  Eric Lo,et al.  Supporting ranking pattern-based aggregate queries in sequence data cubes , 2009, CIKM.

[32]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[33]  Laks V. S. Lakshmanan,et al.  Discovering leaders from community actions , 2008, CIKM '08.

[34]  Jiawei Han,et al.  Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data , 2007, VLDB.

[35]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[36]  Raymond Chi-Wing Wong,et al.  Creating Competitive Products , 2009, Proc. VLDB Endow..

[37]  Heikki Mannila,et al.  Standing Out in a Crowd: Selecting Attributes for Maximum Visibility , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[38]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[39]  Kevin Lane Keller,et al.  MARKETING MANAGEMENT 12e , 2006 .

[40]  Raghu Ramakrishnan Exploratory Mining in Cube Space , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[42]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[43]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[44]  J. B. Ramsey,et al.  Tests for Specification Errors in Classical Linear Least‐Squares Regression Analysis , 1969 .

[45]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[46]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[47]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[48]  R. Varshney,et al.  Supporting top-k join queries in relational databases , 2011 .

[49]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[50]  Jiawei Han,et al.  DataScope: Viewing Database Contents in Google Maps' Way , 2007, VLDB.

[51]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[52]  Jian Pei,et al.  OLAP on search logs: an infrastructure supporting data-driven applications in search engines , 2009, KDD.

[53]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[54]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[55]  Jiawei Han,et al.  ARCube: supporting ranking aggregate queries in partially materialized data cubes , 2008, SIGMOD Conference.

[56]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[57]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[58]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[59]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[60]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[61]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[62]  David Wai-Lok Cheung,et al.  OLAP on sequence data , 2008, SIGMOD Conference.

[63]  Yixin Chen,et al.  Regression Cubes with Lossless Compression and Aggregation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[64]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[65]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[66]  Raghu Ramakrishnan,et al.  Exploratory mining in cube space , 2006, Data Mining and Knowledge Discovery.

[67]  Bo Zhao,et al.  TopCells: Keyword-based search of top-k aggregated documents in text cube , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[68]  Yizhou Sun,et al.  Region-based online promotion analysis , 2010, EDBT '10.

[69]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[70]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[71]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[72]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[73]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[74]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[75]  Jian Pei,et al.  Logging every footstep: quantile summaries for the entire history , 2010, SIGMOD Conference.

[76]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[77]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[78]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[79]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[80]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[81]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[82]  Matt Gibson,et al.  On clustering to minimize the sum of radii , 2008, SODA '08.

[83]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[84]  Dimitris Kanellopoulos,et al.  Association Rules Mining: A Recent Overview , 2006 .

[85]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[86]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[87]  R. Kuehl Design of Experiments: Statistical Principles of Research Design and Analysis , 1999 .

[88]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[89]  Madhav V. Marathe,et al.  Approximation Algorithms for Clustering to Minimize the Sum of Diameters , 2000, Nord. J. Comput..

[90]  David P. Williamson,et al.  An adaptive algorithm for selecting profitable keywords for search-based advertising services , 2006, EC '06.

[91]  Michael R. Lyu,et al.  Mining social networks using heat diffusion processes for marketing candidates selection , 2008, CIKM '08.

[92]  Peter Lancaster,et al.  Curve and surface fitting - an introduction , 1986 .

[93]  Michael J. Shaw,et al.  Knowledge management and data mining for marketing , 2001, Decis. Support Syst..

[94]  Vahab S. Mirrokni,et al.  Optimal marketing strategies over social networks , 2008, WWW.