A Recommendation Model Based on Latent Principal Factors in Web Navigation Data

Discovery of factors that lead to common navigational patterns can help in improving online information presentation as well as in providing personalized content to users. It is, therefore, necessary to develop techniques that can automatically characterize the users’ underlying navigational objectives and to discover the hidden semantic relationships among users as well as between users and Web objects. Typical approaches to Web usage mining, such as clustering of user sessions, can discover usage patterns directly, but cannot identify the latent factors, intrinsic in users’ navigational behavior, that lead to such patterns. In this paper, we propose an approach based on a latent variable model, called Iterative Principal Factor Analysis, to discover such hidden factors in Web usage data. The hidden factors are then used to create aggregate models of common user profiles which are, in turn, used to provide dynamic recommendations to users. Our experimental results, performed on real Web usage data, verify that the proposed principal factor approach results in better predictive user models, when compared to more traditional approaches such as clustering and principal component analysis.

[1]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[2]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[3]  Anupam Joshi,et al.  Automatic Web User Profiling and Personalization Using Robust Fuzzy Relational Clustering , 2002 .

[4]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[5]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[6]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[7]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[8]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[9]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[10]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[11]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[12]  Myra Spiliopoulou,et al.  Web usage mining for Web site evaluation , 2000, CACM.

[13]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[14]  Jaideep Srivastava,et al.  Creating adaptive Web sites through usage-based clustering of URLs , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[15]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[16]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[17]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[18]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[19]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[20]  Anupam Joshi,et al.  Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator , 1999, WWW 1999.

[21]  Ramakrishnan Srikant,et al.  Mining web logs to improve website organization , 2001, WWW '01.