Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order

One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google's PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.

[1]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[2]  Martin Rosvall,et al.  Networks with Memory , 2013, ArXiv.

[3]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[4]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[5]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[6]  Pedro M. Domingos,et al.  Adaptive Web Navigation for Wireless Devices , 2001, IJCAI.

[7]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Junyi Shen,et al.  Efficient data mining for web navigation patterns , 2004, Inf. Softw. Technol..

[11]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[12]  H. Tong Determination of the order of a Markov chain by Akaike's information criterion , 1975, Journal of Applied Probability.

[13]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  B. J. Winer The Significance Test Controversy--A Reader. , 1971 .

[16]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[17]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[18]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[19]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[20]  Myra Spiliopoulou,et al.  Web Usage Analysis and User Profiling , 2002, Lecture Notes in Computer Science.

[21]  Bernardo A. Huberman,et al.  How Random are Online Social Interactions? , 2012, Scientific reports.

[22]  Jure Leskovec,et al.  Automatic Versus Human Navigation in Information Networks , 2012, ICWSM.

[23]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[24]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[25]  T. W. Anderson,et al.  Markov Chain Order Estimation and Relative Entropy , 2009 .

[26]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[27]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[28]  Jure Leskovec,et al.  The last click: why users give up information network navigation , 2014, WSDM.

[29]  Christopher C. Strelioff,et al.  Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[31]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[32]  Zoubin Ghahramani,et al.  A note on the evidence and Bayesian Occam's razor , 2005 .

[33]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[34]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[35]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[36]  Ingrid Zukerman,et al.  Predicting users' requests on the WWW , 1999 .

[37]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[38]  Mark Hansen,et al.  Predicting Web Users' Next Access Based on Log Data , 2003 .

[39]  H. Tong,et al.  On Markov Chain Modeling to Some Weather Data , 1976 .

[40]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[41]  Ivo Schneider Statistics on the table: The history of statistical concepts and methods , 2005 .

[42]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[43]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[44]  I. Csiszár,et al.  The consistency of the BIC Markov order estimator , 2000 .

[45]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[46]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[47]  LeveneMark,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007 .

[48]  I︠U︡riĭ Aleksandrovich Kravt︠s︡ov Limits of predictability , 1993 .

[49]  Lada A. Adamic,et al.  Novelty and Social Search in the World Wide Web , 1998, ArXiv.

[50]  R. Katz On Some Criteria for Estimating the Order of a Markov Chain , 1981 .

[51]  Azer Bestavros,et al.  Using speculation to reduce server load and service time on the WWW , 1995, CIKM '95.

[52]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[53]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[54]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.