Web usage mining with intentional browsing data

Many researches have developed Web usage mining (WUM) algorithms utilizing Web log records in order to discover useful knowledge to be used in supporting business applications and decision making. The quality of WUM in knowledge discovery, however, depends on the algorithm as well as on the data. This research explores a new data source called intentional browsing data (IBD) for potentially improving the effectiveness of WUM applications. IBD is a category of online browsing actions, such as ''copy'', ''scroll'', or ''save as,'' and is not recorded in Web log files. Consequently, the research aims to build a basic understanding of IBD which will lead to its easy adoption in WUM research and practice. Specifically, this paper formally defines IBD and clarifies its relationships with other browsing data via a proposed taxonomy. In order to make IBD available like Web log files, an online data collection mechanism for capturing IBD is also proposed and discussed. The potential benefits of IBD can be justified in terms of its enhancing and complementary effectiveness, which are illustrated by the rule implications of Web transaction mining algorithm for an EC application. Introducing IBD opens up the scope of WUM research and applications in knowledge discovery.

[1]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[2]  Philip S. Yu Data mining and personalization technologies , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[3]  T. Hong,et al.  Mining linguistic browsing patterns in the world wide web , 2002, Soft Comput..

[4]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[5]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Yue-Shi Lee,et al.  Mining Web transaction patterns in electronic commerce environment , 2004, IEEE International Conference on E-Commerce Technology for Dynamic E-Business.

[7]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 1999, Artif. Intell..

[8]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[9]  Banwari Mittal,et al.  The role of personalization in service encounters , 1996 .

[10]  Dell Zhang,et al.  A novel Web usage mining approach for search engines , 2002, Comput. Networks.

[11]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[12]  Masaru Kitsuregawa,et al.  Web mining and its SQL based parallel execution , 2001 .

[13]  Dino Pedreschi,et al.  Web log data warehousing and mining for intelligent web caching , 2001, Data Knowl. Eng..

[14]  Anupam Joshi,et al.  Warehousing and mining Web logs , 1999, WIDM '99.

[15]  Carlos R. Cunha,et al.  Determining WWW user's next access and its application to pre-fetching , 1997, Proceedings Second IEEE Symposium on Computer and Communications.

[16]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[17]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[18]  Mik Lamming,et al.  Interactive system design , 1995 .

[19]  Robert W. Reeder,et al.  WebLogger: A Data Collection Tool for Web-use Studies , 2000 .

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  A. Iyengar,et al.  An analysis of Web server performance , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[23]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[24]  C. T. Chang,et al.  An Enhanced Transaction Identification Module on Web Usage Mining , 2001 .

[25]  Paul P. Maglio,et al.  An architecture for developing attentive information systems , 2001, Knowl. Based Syst..

[26]  Heikki Mannila,et al.  Similarity of event sequences , 1997, Proceedings of TIME '97: 4th International Workshop on Temporal Representation and Reasoning.

[27]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[28]  Mary Modahl,et al.  Now or Never : How Companies Must Change Today to Win the Battle for Internet Consumers , 1999 .