Identification of Navigational Paths of Users Routed through Proxy Servers for Web Usage Mining

The web log file gives a detailed account of who accessed the web site, what pages were requested, and in what order and how long each page was viewed. However, log files are not only unstructured but also distorted in many cases. Especially, log files could be seriously distorted when web pages are requested by the users routed through proxy servers. Therefore, preparative processing is necessary prior to the analysis and discovery of meaningful information. In this article, an algorithm is developed to identify the users and their navigational paths when users are routed through proxy servers. The proposed algorithm is then experimentally evaluated using a real website and ten groups of users, each with two or three people. The experimental results show that the average ratios of correct and incorrect page restoration are 78% and 4.1%, respectively, which indicate that the proposed algorithm can be used as a reasonable tool for identifying the navigational paths of the users routed through proxy servers.

[1]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[2]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.