Semantically Enriched Variable Length Markov Chain Model for Analysis of User Web Navigation Sessions

The rapid growth of the World Wide Web has resulted in intricate Web sites, demanding enhanced user skills to find the required information and more sophisticated tools that are able to generate apt recommendations. Markov Chains have been widely used to generate next-page recommendations; however, accuracy of such models is limited. Herein, we propose the novel Semantic Variable Length Markov Chain Model (SVLMC) that combines the fields of Web Usage Mining and Semantic Web by enriching the Markov transition probability matrix with rich semantic information extracted from Web pages. We show that the method is able to enhance the prediction accuracy relatively to usage-based higher order Markov models and to semantic higher order Markov models based on ontology of concepts. In addition, the proposed model is able to handle the problem of ambiguous predictions. An extensive experimental evaluation was conducted on two real-world data sets and on one partially generated data set. The results show that the proposed model is able to achieve 15–20% better accuracy than the usage-based Markov model, 8–15% better than the semantic ontology Markov model and 7–12% better than semantic-pruned Selective Markov Model. In summary, the SVLMC is the first work proposing the integration of a rich set of detailed semantic information into higher order Web usage Markov models and experimental results reveal that the inclusion of detailed semantic data enhances the prediction ability of Markov models.

[1]  Zhengxin Chen,et al.  A Descriptive Framework for the Field of Data Mining and Knowledge Discovery , 2008, Int. J. Inf. Technol. Decis. Mak..

[2]  Naomie Salim,et al.  Rough Sets Clustering and Markov model for Web Access Prediction , 2006 .

[3]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[4]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Reda Alhajj,et al.  Effective web log mining and online navigational pattern prediction , 2013, Knowl. Based Syst..

[6]  Mark Levene,et al.  Computing the Entropy of User Navigation in the Web , 2003, Int. J. Inf. Technol. Decis. Mak..

[7]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[8]  Michalis Vazirgiannis,et al.  Introducing Semantics in Web Personalization: The Role of Ontologies , 2005, EWMF/KDO.

[9]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[10]  Mark Levene,et al.  Testing the Predictive Power of Variable History Web Usage , 2007, Soft Comput..

[11]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..

[12]  Olfa Nasraoui,et al.  Efficient Hybrid Web Recommendations Based on Markov Clickstream Models and Implicit Search , 2007, Web Intelligence.

[13]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  Myra Spiliopoulou,et al.  Measuring the Accuracy of Sessionizers for Web Usage Analysis , 2001 .

[16]  Mark Levene,et al.  Generating Dynamic Higher-Order Markov Models in Web Usage Mining , 2005, PKDD.

[17]  Ning Zhong,et al.  Web Farming with Clickstream , 2008, Int. J. Inf. Technol. Decis. Mak..

[18]  Pinar Senkul,et al.  Improving pattern quality in web usage mining by using semantic information , 2012, Knowledge and Information Systems.

[19]  Riccardo Leonardi,et al.  Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[21]  Torben Bach Pedersen,et al.  Evaluating the markov assumption for web usage mining , 2003, WIDM '03.

[22]  Michalis Vazirgiannis,et al.  Web path recommendations based on page ranking and Markov models , 2005, WIDM '05.

[23]  Jae-Yearn Kim,et al.  A Sequence-Element-Based Hierarchical Clustering Algorithm For Categorical Sequence Data , 2005, Int. J. Inf. Technol. Decis. Mak..

[24]  Evangelos Theodoridis,et al.  A Web-Page Usage Prediction Scheme Using Weighted Suffix Trees , 2007, SPIRE.

[25]  Christie I. Ezeife,et al.  Semantic-Rich Markov Models for Web Prefetching , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[26]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[27]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[29]  Hua Wang,et al.  A Framework of Combining Markov Model With Association Rules for Predicting Web Page Accesses , 2006, AusDM.

[30]  Christie I. Ezeife,et al.  Using domain ontology for semantic web usage mining and next page prediction , 2009, CIKM.

[31]  M. J. del Jesus,et al.  Web usage mining to improve the design of an e-commerce website: OrOliveSur.com , 2012, Expert Syst. Appl..

[32]  Michael K. Ng,et al.  Higher‐order Markov chain models for categorical data sequences * , 2004 .

[33]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[34]  Siu Cheung Hui,et al.  Generation of Personalized Ontology Based on Consumer Emotion and Behavior Analysis , 2012, IEEE Transactions on Affective Computing.

[35]  Ali Mamat,et al.  WebPUM: A Web-based recommendation system to predict user future movements , 2010, Expert Syst. Appl..

[36]  Jie Lu,et al.  Ontology-style Web usage model for semantic Web applications , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[37]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[38]  Mark Levene,et al.  A Comparison of Scoring Metrics for Predicting the Next Navigation Step with Markov Model-Based Systems , 2010, Int. J. Inf. Technol. Decis. Mak..

[39]  Sungjune Park,et al.  Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm , 2008, Data Knowl. Eng..

[40]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[41]  Xiangji Huang,et al.  Comparison of Interestingness Measures for Web Usage Mining: an Empirical Study , 2007, Int. J. Inf. Technol. Decis. Mak..

[42]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[43]  Charu C. Aggarwal A segment-based framework for modeling and mining data streams , 2010, Knowledge and Information Systems.

[44]  Letha H. Etzkorn,et al.  Cohesion Metrics for Ontology Design and Application , 2005 .

[45]  Zhang Hui,et al.  Semantic session analysis for Web usage mining , 2007 .

[46]  Hongyu Zhang,et al.  Measuring design complexity of semantic web ontologies , 2010, J. Syst. Softw..