Investigating the performance of automatic new topic identification across multiple datasets

Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that neural networks are successful in automatic new topic identification. However most of this work applied their new topic identification algorithms on data logs from a single search engine. In this study, we investigate whether the application of neural networks for automatic new topic identification are more successful on some search engines than others. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that query logs with more topic shifts tend to provide more successful results on shift-based performance measures, whereas logs with more topic continuations tend to provide better results on continuation-based performance measures.

[1]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[2]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[3]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[4]  Amanda Spink,et al.  Use of query reformulation and relevance feedback by Excite users , 2000, Internet Res..

[5]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[6]  Huseyin Cenk Özmutlu,et al.  Application of automatic topic identification on Excite Web search engine data logs , 2005, Inf. Process. Manag..

[7]  Amanda Spink,et al.  A day in the life of Web searching: an exploratory study , 2004, Inf. Process. Manag..

[8]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[9]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[10]  Amanda Spink,et al.  Multitasking information seeking and searching processes , 2002, J. Assoc. Inf. Sci. Technol..

[11]  SpinkAmanda,et al.  From E-Sex to E-Commerce , 2002 .

[12]  Amanda Spink,et al.  Are people asking questions of general Web search engines? , 2003, Online Inf. Rev..

[13]  Seda Ozmutlu Automatic new topic identification using multiple linear regression , 2006 .

[14]  Amanda Spink,et al.  Neural network applications for automatic new topic identification on excite web search engine data logs , 2004, ASIST.

[15]  Thad Starner,et al.  Web Technologies - Thick Clients for Personal Wireless Devices , 2002, Computer.

[16]  David J. Harper,et al.  Topic modeling for mediated access to very large document collections , 2004, J. Assoc. Inf. Sci. Technol..

[17]  Amanda Spink,et al.  Analysis of large data logs: an application of Poisson sampling on excite web queries , 2002, Inf. Process. Manag..

[18]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[19]  Seda Özmutlu,et al.  Neural network applications for automatic new topic identification , 2005, Online Inf. Rev..

[20]  Amanda Spink,et al.  Multimedia Web searching trends: 1997-2001 , 2003, Inf. Process. Manag..

[21]  Amanda Spink,et al.  Characteristics of question format web queries: an exploratory study , 2002, Inf. Process. Manag..

[22]  Shui-Lung Chuang,et al.  Subject categorization of query terms for exploring Web users' search interests , 2002, J. Assoc. Inf. Sci. Technol..

[23]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.