Modeling Web Browsing Behavior across Tabs and Websites with Tracking and Prediction on the Client Side

Clickstreams on individual websites have been studied for decades to gain insights into user interests and to improve website experiences. This paper proposes and examines a novel sequence modeling approach for web clickstreams, that also considers multi-tab branching and backtracking actions across websites to capture the full action sequence of a user while browsing. All of this is done using machine learning on the client side to obtain a more comprehensive view and at the same time preserve privacy. We evaluate our formalism with a model trained on data collected in a user study with three different browsing tasks based on different human information seeking strategies from psychological literature. Our results show that the model can successfully distinguish between browsing behaviors and correctly predict future actions. A subsequent qualitative analysis identified five common web browsing patterns from our collected behavior data, which help to interpret the model. More generally, this illustrates the power of overparameterization in ML and offers a new way of modeling, reasoning with, and prediction of observable sequential human interaction behaviors.

[1]  Brian Detlor,et al.  Information Seeking on the Web: An Integrated Model of Browsing and Searching , 2000, First Monday.

[2]  David Ellis,et al.  A Comparison of the Information seeking Patterns of researchers in the Physical and Social Sciences , 1993, J. Documentation.

[3]  Gang Wang,et al.  Clickstream User Behavior Models , 2017, ACM Trans. Web.

[4]  Anind K. Dey,et al.  Modeling and Understanding Human Routine Behavior , 2016, CHI.

[5]  T. D. Wilson,et al.  Information behaviour: an interdisciplinary perspective , 1997, Inf. Process. Manag..

[6]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[7]  Ryen W. White,et al.  Understanding web browsing behaviors through Weibull analysis of dwell time , 2010, SIGIR.

[8]  Susan E. Gindin Lost and Found in Cyberspace: Informational Privacy in the Age of the Internet , 1997 .

[9]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[10]  Ryen W. White,et al.  Parallel browsing behavior on the web , 2010, HT '10.

[11]  Sarah Waterson,et al.  In the lab and out in the wild: remote web usability testing for mobile devices , 2002, CHI Extended Abstracts.

[12]  G. Salton,et al.  Measuring success. , 2008, Canadian family physician Medecin de famille canadien.

[13]  Laura A. Dabbish,et al.  "My Data Just Goes Everywhere: " User Mental Models of the Internet and Implications for Privacy and Security , 2015, SOUPS.

[14]  J. Reidenberg Governing Networks and Rule-Making in Cyberspace , 1996 .

[15]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[16]  Tula Giannini Information Receiving: A Primary Mode of the Information Process , 1998 .

[17]  Lorrie Faith Cranor,et al.  The platform for privacy preferences , 1999, CACM.

[18]  Anja Feldmann,et al.  Web search clickstreams , 2006, IMC '06.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  John Walsh,et al.  The internet: a new era in customer service , 2000 .

[21]  Zhiqiang Zheng,et al.  Personalization from incomplete data: what you don't know can hurt , 2001, KDD '01.

[22]  David Ellis,et al.  A behavioural model for information retrieval system design , 1989, J. Inf. Sci..

[23]  T. D. Wilson,et al.  On user studies and information needs , 2006, J. Documentation.

[24]  Heidi E. Julien,et al.  Information behavior , 2009, Annu. Rev. Inf. Sci. Technol..

[25]  David Ellis,et al.  Modelling the information seeking patterns of engineers and research scientists in an industrial environment , 1997, J. Documentation.

[26]  Jeffrey Heer,et al.  What did they do? understanding clickstreams with the WebQuilt visualization system , 2002, AVI '02.

[27]  Rui Meng,et al.  Towards an integrated clickstream data analysis framework for understanding web users' information behavior , 2017 .

[28]  Ryen W. White,et al.  No search result left behind: branching behavior with browser tabs , 2012, WSDM '12.

[29]  Gang Wang,et al.  Unsupervised Clickstream Clustering for User Behavior Analysis , 2016, CHI.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Mayank Agarwal,et al.  Machine Translation: A Literature Review , 2018, ArXiv.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Anália Lourenço,et al.  Catching web crawlers in the act , 2006, ICWE '06.

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[36]  Balaraman Ravindran,et al.  A neural attention based approach for clickstream mining , 2018, COMAD/CODS.

[37]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[38]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.