A Bayesian Method for Comparing Hypotheses About Human Trails

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful, for example, for improving underlying network structures, predicting user clicks, or enhancing recommendations. In this work, we present a method called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our method utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to calculate the evidence of the data under them. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method, and to compare the relative plausibility of hypotheses, we employ Bayes factors. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including Web site navigation, business reviews, and online music played. Our work expands the repertoire of methods available for studying human trails.

[1]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[2]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[3]  Phuoc Tran-Gia,et al.  MicroTrails: comparing hypotheses about task selection on a crowdsourcing platform , 2015, I-KNOW.

[4]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[5]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[6]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[7]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[8]  Judith Masthoff,et al.  Group Modeling: Selecting a Sequence of Television Items to Suit a Group of Viewers , 2004, User Modeling and User-Adapted Interaction.

[9]  L. Bécu,et al.  Evidence for three-dimensional unstable flows in shear-banding wormlike micelles. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  R. Sinnott Virtues of the Haversine , 1984 .

[11]  Christos Faloutsos,et al.  Fast mining and forecasting of complex time-stamped events , 2012, KDD.

[12]  T. H. Nelson,et al.  Complex information processing: a file structure for the complex, the changing and the indeterminate , 1965, ACM '65.

[13]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[14]  Daniele Quercia,et al.  Partisan sharing: facebook evidence and societal consequences , 2014, COSN '14.

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[17]  Ed H. Chi,et al.  Using information scent to model user information needs and actions and the Web , 2001, CHI.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Markus Strohmaier,et al.  Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains , 2014, J. Biomed. Informatics.

[20]  Andreas Hotho,et al.  VizTrails: An Information Visualization Tool for Exploring Geographic Movement Trajectories , 2015, HT.

[21]  Ryen W. White,et al.  Assessing the scenic route: measuring the value of search trails in web logs , 2010, SIGIR.

[22]  Markus Strohmaier,et al.  Sequential Action Patterns in Collaborative Ontology-Engineering Projects: A Case-Study in the Biomedical Domain , 2014, CIKM.

[23]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[24]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[25]  Ryen W. White,et al.  Stream prediction using a generative model based on frequent episodes in event sequences , 2008, KDD.

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Cong Yu,et al.  Automatic construction of travel itineraries using social breadcrumbs , 2010, HT '10.

[28]  Òscar Celma,et al.  Music recommendation and discovery in the long tail , 2008 .

[29]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[30]  Enric Plaza,et al.  Case-Based Sequential Ordering of Songs for Playlist Recommendation , 2006, ECCBR.

[31]  A. O'Hagan,et al.  Statistical Methods for Eliciting Probability Distributions , 2005 .

[32]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[33]  Christopher C. Strelioff,et al.  Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Jure Leskovec,et al.  Finding progression stages in time-evolving event sequences , 2014, WWW.

[35]  A. Hotho,et al.  HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web , 2014, WWW.

[36]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[37]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[38]  Markus Strohmaier,et al.  Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity , 2016, EPJ Data Science.

[39]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[40]  Markus Strohmaier,et al.  Discovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data , 2016, WWW.

[41]  M. Lee,et al.  Using priors to formalize theory: Optimal attention and the generalized context model , 2012, Psychonomic bulletin & review.

[42]  Matthew Chalmers,et al.  The Order of Things: Activity-Centred Information Access, , 1998, Comput. Networks.

[43]  Andreas Hotho,et al.  SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails , 2016, WWW.

[44]  Georgios Zervas,et al.  The groupon effect on yelp ratings: a root cause analysis , 2012, EC '12.

[45]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[46]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[47]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[48]  Mark Fischetti,et al.  Weaving the web - the original design and ultimate destiny of the World Wide Web by its inventor , 1999 .

[49]  Andreas Hotho,et al.  Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia , 2013, Int. J. Semantic Web Inf. Syst..

[50]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[51]  Byron J. Pierce,et al.  Effects of Semantic Similarity, Omission Probability and Number of Alternatives in Computer Menu Search , 1992, Int. J. Man Mach. Stud..

[52]  Andreas Hotho,et al.  Photowalking the City: Comparing Hypotheses About Urban Photo Trails on Flickr , 2015, SocInfo.

[53]  Markus Strohmaier,et al.  Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects , 2015, International Semantic Web Conference.

[54]  Wasserman,et al.  Bayesian Model Selection and Model Averaging. , 2000, Journal of mathematical psychology.

[55]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[56]  Wolf Vanpaemel,et al.  Constructing informative model priors using hierarchical methods , 2011 .

[57]  Denis Helic,et al.  Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order , 2014, PloS one.

[58]  Andrew Howes,et al.  Good Enough But I'll Just Check: Web-page Search as Attentional Refocusing , 2004, ICCM.

[59]  W. Vanpaemel,et al.  Prior sensitivity in theory testing: An apologia for the Bayes factor , 2010 .

[60]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[61]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.