Log mining to improve the performance of site search

Despite of the popularity of global search engines,people still suffer from low accuracy of site search. Theprimary reason lies in the difference of link structuresand data scale between global Web and website, whichleads to failures of traditional re-ranking methods suchas HITS, PageRank and DirectHit. This paper proposes anovel re-ranking method based on user logs withinwebsites. With the help of website taxonomy, we mine forgeneralized association rules and abstract accesspatterns of different levels. Mining results aresubsequently used to re-rank the retrieved pages. One ofthe advantages of our mining algorithm is that it resolvesthe diversity problem of user's access behavior anddiscovers general patterns. Experiment shows that theproposed method outperforms keyword-based method by15% and DirectHit by 13% respectively.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[3]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[4]  Ming-Syan Chen,et al.  Mining Web Transaction Patterns in an Electronic Commerce Environment , 2000, PAKDD.

[5]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[6]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[7]  Marti A. Hearst,et al.  Link Analysis in Web Information Retrieval , 2000, IEEE Data Eng. Bull..

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Myra Spiliopoulou,et al.  Data Mining for Measuring and Improving the Success of Web Sites , 2004, Data Mining and Knowledge Discovery.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Marti A. Hearst,et al.  Cha-Cha: A System for Organizing Intranet Search Results , 1999, USENIX Symposium on Internet Technologies and Systems.

[12]  Myra Spiliopoulou,et al.  Improving the Effectiveness of a Web Site with Web Usage Mining , 1999, WEBKDD.

[13]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[14]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  LinWeiyang,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2002 .

[17]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[18]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[19]  Mark Levene,et al.  A Web Site Navigation Engine , 2001, WWW Posters.

[20]  Jaideep Srivastava,et al.  Grouping Web page references into transactions for mining World Wide Web browsing patterns , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[21]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[22]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[23]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[24]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[25]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[26]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[27]  Ingrid Zukerman,et al.  Predicting users' requests on the WWW , 1999 .