Market basket analysis with networks

The field of market basket analysis, the search for meaningful associations in customer purchase data, is one of the oldest areas of data mining. The typical solution involves the mining and analysis of association rules, which take the form of statements such as “people who buy diapers are likely to buy beer”. It is well-known, however, that typical transaction datasets can support hundreds or thousands of obvious association rules for each interesting rule, and filtering through the rules is a non-trivial task (Klemettinen et al. In: Proceedings of CIKM, pp 401–407, 1994). One may use an interestingness measure to quantify the usefulness of various rules, but there is no single agreed-upon measure and different measures can result in very different rankings of association rules. In this work, we take a different approach to mining transaction data. By modeling the data as a product network, we discover expressive communities (clusters) in the data, which can then be targeted for further analysis. We demonstrate that our network based approach can concisely isolate influence among products, mitigating the need to search through massive lists of association rules. We develop an interestingness measure for communities of products and show that it isolates useful, actionable communities. Finally, we build upon our experience with product networks to propose a comprehensive analysis strategy by combining both traditional and network-based techniques. This framework is capable of generating insights that are difficult to achieve with traditional analysis methods.

[1]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[2]  Nitesh V. Chawla,et al.  Community Detection in a Large Real-World Social Network , 2008 .

[3]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[4]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[5]  Yoon Ho Cho,et al.  A personalized recommender system based on web usage mining and decision tree induction , 2002, Expert Syst. Appl..

[6]  Christos Faloutsos,et al.  Electricity Based External Similarity of Categorical Attributes , 2003, PAKDD.

[7]  Sanjay Chawla,et al.  Mining Open Source Software (OSS) Data Using Association Rules Network , 2003, PAKDD.

[8]  Sanjay Chawla,et al.  Association Rules Network: Definition and Applications , 2009 .

[9]  Gediminas Adomavicius,et al.  User profiling in personalization applications through rule discovery and validation , 1999, KDD '99.

[10]  Jon Kleinberg,et al.  The Structure of the Web , 2001, Science.

[11]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[12]  D. Kossmann,et al.  What can you do with a Web in your Pocket ? , 2007 .

[13]  Fabrice Guillet,et al.  Exploratory Visualization for Association Rule Rummaging , 2003, KDD 2003.

[14]  C. Mauri Card Loyalty. A New Emerging Issue in Grocery Retailing , 2001 .

[15]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[16]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[17]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[18]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[19]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  Markus H. Gross,et al.  Visualization of directed associations in e-commerce transaction data , 2001, VisSym.

[22]  Luís Cavique,et al.  A scalable algorithm for the market basket analysis , 2007 .

[23]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[24]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[25]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[26]  Sanjay Chawla,et al.  Association Rules Network: Definition and Applications , 2009, Stat. Anal. Data Min..

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[29]  Geert Wets,et al.  Defining interestingness for association rules , 2003 .

[30]  Pak Chung Wong,et al.  Visualizing association rules for text mining , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[31]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[32]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Sanjay Chawla,et al.  On local pruning of association rules using directed hypergraphs , 2004, Proceedings. 20th International Conference on Data Engineering.

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[36]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[37]  César A. Hidalgo,et al.  Scale-free networks , 2008, Scholarpedia.

[38]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[39]  Rajeev Motwani,et al.  What can you do with a Web in your Pocket? , 1998, IEEE Data Eng. Bull..

[40]  J. Doye,et al.  Identifying communities within energy landscapes. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  William DuMouchel,et al.  Empirical bayes screening for multi-item associations , 2001, KDD '01.

[42]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[43]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[44]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  R. Langer,et al.  Where a pill won't reach. , 2003, Scientific American.

[46]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[47]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.