Online and incremental mining of separately-grouped Web access logs

The rising popularity of electronic commerce makes data mining an indispensable technology for business competitiveness. The World Wide Web provides abundant raw data in the form of Web access logs, Web transaction logs and Web user profiles. Without data mining tools, it is impossible to make any sense of such massive data. We focus on Web usage mining because it deals most appropriately with understanding user behavioral patterns which is the key to successful customer relationship management. Previous work dealt separately with specific issues of Web usage mining and made assumptions without taking a holistic view and thus, had limited practical applicability. We formulate a novel and more holistic version of Web usage mining termed transactionized logfile mining (TRALOM) to effectively and correctly identify transactions as well as to mine useful knowledge from Web access logs. We also introduce a new data structure, called the WebTrie, to efficiently hold useful preprocessed data so that TRALOM can be done in an online and incremental fashion. Experiments conducted on real Web server logs verify the usefulness and practicality of our proposed techniques.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Lars Schmidt-Thieme,et al.  Mining Web Navigation Path Fragments , 2002 .

[3]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[4]  Yannis Manolopoulos,et al.  A Data Mining Algorithm for Generalized Web Prefetching , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Yannis Manolopoulos,et al.  Finding Generalized Path Patterns for Web Log Data Mining , 2000, ADBIS-DASFAA.

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Alex Berson,et al.  Building Data Mining Applications for CRM , 1999 .

[8]  Wee Keong Ng,et al.  Rapid association rule mining , 2001, CIKM '01.

[9]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[10]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[11]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[12]  Mark Levene,et al.  Mining Association Rules in Hypertext Databases , 1998, KDD.

[13]  Myra Spiliopoulou,et al.  Measuring the Accuracy of Sessionizers for Web Usage Analysis , 2001 .

[14]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[15]  Wee Keong Ng,et al.  Fast online dynamic association rule mining , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[16]  Yannis Manolopoulos,et al.  . EFFECTIVE PREDICTION OF WEB-USER ACCESSES: A DATA MINING APPROACH , 2001 .

[17]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[18]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[21]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[22]  Ron Kohavi,et al.  Mining e-commerce data: the good, the bad, and the ugly , 2001, KDD '01.

[23]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[24]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[25]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[26]  Jaideep Srivastava,et al.  Grouping Web page references into transactions for mining World Wide Web browsing patterns , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[27]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.