Experimental Verification of the Dependence Between the Expected and Observed Visit Rate of Web Pages

This paper is focused on a utilization of the web usage mining and web structure mining methods. We tried to answer the question if the expected visit rate of individual web pages correlates with the observed visit rate of the same web pages. We used web server log files as a data source. We applied several log file pre-processing methods to identify the user sessions on different levels of granularity. We found out that the quality of acquired knowledge about the users’ behaviour depends on the method of the session identification. We have experimentally proved a higher dependence between the observed and expected visit rates of the examined web pages in well-prepared files with identified user sessions. We found out statistically significant differences between PageRank and a real visit rate in the files with application of more advanced methods of session identification.

[1]  Animesh Tripathy,et al.  A Web Mining Architectural Model of Distributed Crawler for Internet Searches Using PageRank Algorithm , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[2]  Daqing He,et al.  Detecting session boundaries from Web user logs , 2000 .

[3]  Sebastián Ventura,et al.  Applying Web usage mining for personalizing hyperlinks in Web-based adaptive educational systems , 2009, Comput. Educ..

[4]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[5]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[6]  Lili Yan,et al.  Research on PageRank and Hyperlink-Induced Topic Search in Web Structure Mining , 2011, 2011 International Conference on Internet Technology and Applications.

[7]  Dror G. Feitelson,et al.  On extracting session data from activity logs , 2012, SYSTOR '12.

[8]  Peter Svec,et al.  Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor , 2010, ICCS.

[9]  Ramakrishnan Srikant,et al.  Mining web logs to improve website organization , 2001, WWW '01.

[10]  Liyan Zhuang,et al.  New Path Filling Method on Data Preprocessing in Web Mining , 2008, Comput. Inf. Sci..

[11]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[12]  Yanchun Zhang,et al.  Web Mining and Social Networking , 2011 .

[13]  Yan Li,et al.  Research on Path Completion Technique in Web Usage Mining , 2008, 2008 International Symposium on Computer Science and Computational Technology.

[14]  Haibin Liu,et al.  Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests , 2007, Data Knowl. Eng..

[15]  Fatemeh Ahmadi-Abkenari,et al.  A Clickstream Based Web Page Importance Metric for Customized Search Engines , 2013, Trans. Comput. Collect. Intell..

[16]  Vincent S. Tseng,et al.  Effective Ranking and Recommendation on Web Page Retrieval by Integrating Association Mining and PageRank , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[20]  Jaideep Srivastava,et al.  Web usage mining: discovery and application of interesting patterns from web data , 2000 .

[21]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[22]  Maguelonne Teisseire,et al.  Using data mining techniques on Web access logs to dynamically improve hypertext structure , 1999, LINK.

[23]  Santo Fortunato,et al.  Ranking web sites with real user traffic , 2008, WSDM '08.

[24]  Yanchun Zhang,et al.  Web Mining and Social Networking: Techniques and Applications , 2010 .

[25]  Gang Wu,et al.  Arnoldi versus GMRES for computing pageRank: A theoretical contribution to google's pageRank problem , 2010, TOIS.

[26]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[27]  Reda Alhajj,et al.  Effective web log mining and online navigational pattern prediction , 2013, Knowl. Based Syst..

[28]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[29]  A. Jain,et al.  Page Ranking Algorithms in Web Mining, Limitations of Existing Methods and a New Method for Indexing Web Pages , 2013, 2013 International Conference on Communication Systems and Network Technologies.

[30]  David Gunnarsson Lorentzen Webometrics benefitting from web mining? An investigation of methods and applications of two research fields , 2013, Scientometrics.

[31]  Michalis Vazirgiannis,et al.  Usage-based PageRank for Web personalization , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[32]  Zhiqiu Huang,et al.  An Improved Algorithm for Session Identification on Web Log , 2010, WISM.

[33]  Ibrahim Türkoglu,et al.  Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method , 2009, Expert Syst. Appl..

[34]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[35]  Jianfeng Gao,et al.  Mining web logs for actionable knowledge , 2004 .

[36]  James Miller,et al.  Empirical observations on the session timeout threshold , 2009, Inf. Process. Manag..