Semantic Preprocessing of Web Request Streams for Web Usage Mining

Efficient data preparation needs to discover the underlying knowledge from complicated Web usage data. In this paper, we have focused on two main tasks, seman- tic outlier detection from online Web request streams and segmentation (or session- ization) of them. We thereby exploit semantic technologies to infer the relationships among Web requests. Web ontologies such as taxonomies and directories can label each Web request as all the corresponding hierarchical topic paths. Our algorithm consists of two steps. The first step is the nested repetition of top-down partitioning for es- tablishing a set of candidates of session boundaries, and the next step is evaluation process of bottom-up merging for reconstructing segmented sequences. In addition, we propose the hybrid approach of this method, as combining with the existing heuristics. Using synthesized dataset and real-world dataset of the access log files of IRCache ,w e conducted experiments and showed that semantic preprocessing method improves the performance of rule discovery algorithms. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultane- ously try to search various kinds of information on the Web.

[1]  Jason J. Jung Collaborative Web Browsing Based on Semantic Extraction of User Interests with Bookmarks , 2005, J. Univers. Comput. Sci..

[2]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[3]  Myra Spiliopoulou,et al.  The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis , 2002, WEBKDD.

[4]  Georgios Paliouras,et al.  Web Usage Mining as a Tool for Personalization: A Survey , 2003, User Modeling and User-Adapted Interaction.

[5]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[10]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[11]  Ke Wang,et al.  Discovering Patterns from Large and Dynamic Sequential Data , 1997, Journal of Intelligent Information Systems.

[12]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[13]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[14]  Jan Van den Bussche,et al.  Navigating with a Browser , 2002, ICALP.

[15]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[16]  Myra Spiliopoulou,et al.  WUM: A tool for Web Utilization analysis , 1999 .

[17]  Myra Spiliopoulou,et al.  Measuring the Accuracy of Sessionizers for Web Usage Analysis , 2001 .

[18]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20]  Dino Pedreschi,et al.  Web log data warehousing and mining for intelligent web caching , 2001, Data Knowl. Eng..

[21]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[22]  Timothy W. Finin,et al.  Yahoo! as an ontology: using Yahoo! categories to describe documents , 1999, CIKM '99.

[23]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[24]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[25]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[26]  Wei-Ying Ma,et al.  A unified framework for Web link analysis , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[27]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[28]  Mário J. Silva,et al.  Web Access Mining from an On-line Newspaper Logs , 2001 .

[29]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .