Graph-based Analysis for E-Commerce Recommendation

Recommender systems automate the process of recommending products and services to customers based on various types of data including customer demographics, product features, and, most importantly, previous interactions between customers and products (e.g., purchasing, rating, and catalog browsing). Despite significant research progress and growing acceptance in real-world applications, two major challenges remain to be addressed to implement effective e-commerce recommendation applications. The first challenge is concerned with making recommendations based on sparse transaction data. The second challenge is the lack of a unified framework to integrate multiple types of input data and recommendation approaches. This dissertation investigates graph-based algorithms to address these two problems. The proposed approach is centered on consumer-product graphs that represent sales transactions as links connecting consumer and product nodes. In order to address the sparsity problem, I investigate the network spreading activation algorithms and a newly proposed link analysis algorithm motivated by ideas from Web graph analysis techniques. Experimental results with several e-commerce datasets indicated that both classes of algorithms outperform a wide range of existing collaborative filtering algorithms, especially under sparse data. Two graph-based models that enhance the simple consumer-product graph were proposed to provide unified recommendation frameworks. The first model, a two-layer graph model, enhances the consumer-product graph by incorporating the consumer/product attribute information as consumer and product similarity links. The second model is based on probabilistic relational models (PRMs) developed in the relational learning literature. It is demonstrated with e-commerce datasets that the proposed frameworks not only conceptually unify many of the existing recommendation approaches but also allow the exploitation of a wider range of data patterns in an integrated manner, leading to improved recommendation performance. In addition to the recommendation algorithm design research, this dissertation also employs the random graph theory to study the topological characteristics of consumer-product graphs and the fundamental mechanisms that generate the sales transaction data. This research represents the early step towards a meta-level analysis framework for validating the fundamental assumptions made by different recommendation algorithms regarding the consumer-product interaction generation process and thus supporting systematic recommendation model/algorithm selection and evaluation.

[1]  Naufel J. Vilcassim,et al.  Investigating Household Purchase Timing Decisions: A Conditional Hazard Function Approach , 1991 .

[2]  K. J. Lynch,et al.  Generating, integrating, and activating thesauri for concept-based document retrieval , 1993, IEEE Expert.

[3]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[4]  Hong Wang,et al.  Rating news documents for similarity , 2000 .

[5]  Johan Bollen,et al.  Mining Associative Relations from Website Logs and their Application to Context-Dependent Retrieval Using Spreading Activation , 1999, WOWS.

[6]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[7]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[8]  Anupam,et al.  Mining Web Access Logs Using Relational Competitive Fuzzy Clustering , 1999 .

[9]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[10]  S. Redner,et al.  Connectivity of growing random networks. , 2000, Physical review letters.

[11]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[12]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[13]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[14]  Zan Huang,et al.  A Comparative Study of Recommendation Algorithms in E- Commerce Applications , 2005 .

[15]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[16]  P. C. Wong,et al.  Generalized vector spaces model in information retrieval , 1985, SIGIR '85.

[17]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[18]  Dipak C. Jain,et al.  A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data , 1994 .

[19]  Stephen Pollock,et al.  A rule-based message filtering system , 1988, TOIS.

[20]  Eric T. Bradlow,et al.  A Bayesian Lifetime Model for the “Hot 100” Billboard Songs , 2001 .

[21]  Hsinchun Chen,et al.  Updateable PAT-Tree Approach to Chinese Key PhraseExtraction using Mutual Information: A Linguistic Foundation for Knowledge Management , 1999 .

[22]  Gediminas Adomavicius,et al.  Using Data Mining Methods to Build Customer Profiles , 2001, Computer.

[23]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[24]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[25]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[26]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[27]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[28]  Ivan Koychev,et al.  Learning to recommend from positive evidence , 2000, IUI '00.

[29]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[30]  Hsinchun Chen,et al.  Cognitive process as a basis for intelligent retrieval systems design , 1991, Inf. Process. Manag..

[31]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[32]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[33]  Joel A. C. Baum,et al.  Where Do Small Worlds Come From? , 2003 .

[34]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[35]  David Poole,et al.  Probabilistic Horn Abduction and Bayesian Networks , 1993, Artif. Intell..

[36]  Philip S. Yu,et al.  Horting hatches an egg: a new graph-theoretic approach to collaborative filtering , 1999, KDD '99.

[37]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[38]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[40]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[41]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[42]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[43]  Sumit Sarkar,et al.  The Role of the Management Sciences in Research on Personalization , 2003, Manag. Sci..

[44]  Bamshad Mobasher,et al.  Discovery of Aggregate Usage Profiles for Web Personalization , 2000 .

[45]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[46]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[47]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[48]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[49]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.

[50]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[51]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[52]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[53]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[54]  R. Greiner,et al.  Hierarchical Probabilistic Relational Models for Collaborative Filtering , 2004 .

[55]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[56]  Hsinchun Chen,et al.  Exploring the use of concept spaces to improve medical information retrieval , 2000, Decis. Support Syst..

[57]  Ravi Kumar,et al.  Recommendation systems: a probabilistic analysis , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[58]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[59]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[61]  Mark Claypool,et al.  Combining Content-Based and Collaborative Filters in an Online Newspaper , 1999, SIGIR 1999.

[62]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[63]  Kevin Knight,et al.  Connectionist ideas and algorithms , 1990, CACM.

[64]  F. Heider Attitudes and cognitive organization. , 1946, The Journal of psychology.

[65]  Hsinchun Chen,et al.  A graph-based recommender system for digital library , 2002, JCDL '02.

[66]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[67]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[68]  W. T. Tutte Graph Theory , 1984 .

[69]  H GonnetGaston,et al.  Fast text searching for regular expressions or automaton searching on tries , 1996 .

[70]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[71]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[72]  Roger Wattenhofer,et al.  The impact of Internet policy and topology on delayed routing convergence , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[73]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[74]  B. Bollobás The evolution of random graphs , 1984 .

[75]  John Riedl,et al.  Explaining collaborative filtering recommendations , 2000, CSCW '00.

[76]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[77]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[78]  Robin Burke,et al.  Semantic ratings and heuristic similarity for collaborative filtering , 2000 .

[79]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[80]  Ian Soboroff,et al.  Collaborative filtering and the generalized vector space model (poster session) , 2000, SIGIR '00.

[81]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[82]  Jonathan Furner,et al.  On recommending , 2002, J. Assoc. Inf. Sci. Technol..

[83]  P. ERDbS ON THE STRENGTH OF CONNECTEDNESS OF A RANDOM GRAPH , 2001 .

[84]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[85]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[86]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[87]  Elwood S. Buffa,et al.  Graph Theory with Applications , 1977 .

[88]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[89]  Naren Ramakrishnan,et al.  Studying Recommendation Algorithms by Graph Analysis , 2003, Journal of Intelligent Information Systems.

[90]  Christian Posse,et al.  Bayesian Mixed-Effects Models for Recommender Systems , 1999 .

[91]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[92]  A. Ehrenberg,et al.  Repeat-Buying: Facts, Theory and Applications. , 1989 .

[93]  Munindar P. Singh,et al.  Community-based service location , 2001, CACM.

[94]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[95]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[96]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[97]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[98]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Kristian J. Hammond,et al.  Mining navigation history for recommendation , 2000, IUI '00.

[100]  Lise Getoor,et al.  Using Probabilistic Relational Models for Collaborative Filtering , 1999 .

[101]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[102]  Gerald L. Lohse,et al.  Predictors of online buying behavior , 1999, CACM.

[103]  Garry Robins,et al.  Small Worlds Among Interlocking Directors: Network Structure and Distance in Bipartite Graphs , 2004, Comput. Math. Organ. Theory.

[104]  Mark Muldoon,et al.  The Small World Network Structure of Boards of Directors , 2004 .

[105]  Peter S. Fader,et al.  Forecasting Repeat Sales at CDNOW: A Case Study , 2001 .

[106]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[107]  John Riedl,et al.  E-Commerce Recommendation Applications , 2004, Data Mining and Knowledge Discovery.

[108]  Kirsten Swearingen,et al.  Interaction Design for Recommender Systems , 2002 .

[109]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[110]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[111]  John R. Anderson A spreading activation theory of memory. , 1983 .

[112]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[113]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[114]  A. Barabasi,et al.  Emerging behavior in electronic bidding. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[115]  Peter S. Fader,et al.  Forecasting Repeat Sales at CDNOW: A Case Study , 2001, Interfaces.

[116]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[117]  David M. Pennock,et al.  Generative Models for Cold-Start Recommendations , 2001 .

[118]  Dean P. Foster,et al.  A Formal Statistical Approach to Collaborative Filtering , 1998 .

[119]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[120]  S. N. Dorogovtsev,et al.  Scaling properties of scale-free evolving networks: continuous approach. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[121]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[122]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[123]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[124]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[125]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[126]  Fabio Crestani,et al.  Searching the web by constrained spreading activation , 2000, Inf. Process. Manag..

[127]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[128]  Batul J. Mirza,et al.  Jumping Connections: A Graph-Theoretic Model for Recommender Systems , 2001 .

[129]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[130]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[131]  Hsinchun Chen,et al.  A graph model for E-commerce recommender systems , 2004, J. Assoc. Inf. Sci. Technol..