Link prediction and path analysis using Markov chains

Abstract The enormous growth in the number of documents in the World Wide Web increases the need for improved link navigation and path analysis models. Link prediction and path analysis are important problems with a wide range of applications ranging from personalization to Web server request prediction. The sheer size of the World Wide Web coupled with the variation in users' navigation patterns makes this a very difficult sequence modelling problem. In this paper, the notion of probabilistic link prediction and path analysis using Markov chains is proposed and evaluated. Markov chains allow the system to dynamically model the URL access patterns that are observed in navigation logs based on the previous state. Furthermore, the Markov chain model can also be used in a generative mode to automatically obtain tours. The Markov transition matrix can be analysed further using eigenvector decomposition to obtain `personalized hubs/authorities'. The utility of the Markov chain approach is demonstrated in many domains: HTTP request prediction, system-driven adaptive Web navigation, tour generation, and detection of `personalized hubs/authorities' from user navigation profiles. The generality and power of Markov chains is a first step towards the application of powerful probabilistic models to Web path analysis and link prediction.

[1]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1998, SODA '98.

[2]  John D. Garofalakis,et al.  Web Site Optimization Using Page Popularity , 1999, IEEE Internet Comput..

[3]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[4]  Pattie Maes,et al.  Footprints: history-rich tools for information foraging , 1999, CHI '99.

[5]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[6]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 1999, Artif. Intell..

[7]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[8]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[9]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[10]  Gerhard Weikum,et al.  Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions , 1998, The VLDB Journal.

[11]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[12]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[13]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[14]  David Wai-Lok Cheung,et al.  Discovering user access patterns on the World Wide Web , 1998, Knowl. Based Syst..

[15]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.