Predicting WWW surfing using multiple evidence combination

The improvement of many applications such as web search, latency reduction, and personalization/ recommendation systems depends on surfing prediction. Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper, we combine two classification techniques, namely, the Markov model and Support Vector Machines (SVM), to resolve prediction using Dempster’s rule. Such fusion overcomes the inability of the Markov model in predicting the unseen data as well as overcoming the problem of multiclassification in the case of SVM, especially when dealing with large number of classes. We apply feature extraction to increase the power of discrimination of SVM. In addition, during prediction we employ domain knowledge to reduce the number of classifiers for the improvement of accuracy and the reduction of prediction time. We demonstrate the effectiveness of our hybrid approach by comparing our results with widely used techniques, namely, SVM, the Markov model, and association rule mining.

[1]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[2]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Olfa Nasraoui,et al.  An Evolutionary Approach to Mining Robust Multi-Resolution Web Profiles and Context Sensitive URL Associations , 2002, Int. J. Comput. Intell. Appl..

[5]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[6]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Dunja Mladenic,et al.  kNN Versus SVM in the Collaborative Filtering Framework , 2006, Data Science and Classification.

[9]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[14]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[15]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[18]  Dan Duchamp,et al.  Prefetching Hyperlinks , 1999, USENIX Symposium on Internet Technologies and Systems.

[19]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[20]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[21]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[22]  Shashi Shekhar,et al.  Web Proxy Server with Intelligent Prefetcher for Dynamic Pages Using Association Rules , 2004 .

[23]  K. Chinen,et al.  An Interactive Prefetching Proxy Server for Improvement of WWW Latency , 1997 .

[24]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[25]  Qiang Yang,et al.  WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[26]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[27]  G. Wahba Multivariate Function and Operator Estimation, Based on Smoothing Splines and Reproducing Kernels , 1992 .

[28]  Olfa Nasraoui,et al.  WebKDD 2004: web mining and web usage analysis post-workshop report , 2004, SKDD.

[29]  Olfa Nasraoui,et al.  Complete This Puzzle: A Connectionist Approach to Accurate Web Recommendations Based on a Committee of Predictors , 2004, WebKDD.

[30]  Ming-Syan Chen,et al.  Integrating Web Caching and Web Prefetching in Client-Side Proxies , 2005, IEEE Trans. Parallel Distributed Syst..

[31]  Wojciech Pieczynski,et al.  Multisensor image segmentation using Dempster-Shafer fusion in Markov fields context , 2001, IEEE Trans. Geosci. Remote. Sens..

[32]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[33]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[34]  Victor Cheng,et al.  Dissimilarity learning for nominal data , 2004, Pattern Recognit..

[35]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[36]  Olfa Nasraoui,et al.  Combining Web Usage Mining and Fuzzy Inference for Website Personalization , 2003 .

[37]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[38]  Olfa Nasraoui,et al.  One Step Evolutionary Mining of Context Sensitive Associations and Web Navigation Patterns , 2002, SDM.

[39]  Ming-Syan Chen,et al.  A new cache replacement algorithm for the integration of web caching and prefectching , 2002, CIKM '02.

[40]  Y. Alp Aslandogan,et al.  Evidence combination in medical data mining , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[41]  Donald H. Kraft,et al.  Textual Information Retrieval with User Profiles Using Fuzzy Clustering and Inferencing , 2003, Intelligent Exploration of the Web.

[42]  Clement T. Yu,et al.  Evaluating strategies and systems for content based indexing of person images on the Web , 2000, ACM Multimedia.