Adaptive Filtering of Newswire Stories using Two-Level Clustering

Adaptive filtering of news is an area of information retrieval gaining substantial interest as services become more available on the Internet. This paper reports on a number of experiments involving a two-level clustering approach using a variety of techniques including threshold adaptation, topic vocabulary adaptation and both noun phrase and named entity adaptation. Our goal in this exploratory research is to empirically compare alternative configurations of our filtering approach that will allow us to better understand the relative value of the component subsystems.

[1]  Munindar P. Singh,et al.  Community-based service location , 2001, CACM.

[2]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[3]  Jean Tague-Sutcliffe,et al.  Simulation of Bibliographic Retrieval Databases Using Hyperterms , 1982, SIGIR.

[4]  Filippo Menczer,et al.  A cluster-based approach to tracking, detection and segmentation of broadcast news , 1999 .

[5]  Padmini Srinivasan,et al.  Cluster-Based Adaptive and Batch Filtering , 1998, TREC.

[6]  Clement T. Yu,et al.  An Evaluation of Term Dependence Models in Information Retrieval , 1982, SIGIR.

[7]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[8]  Stephen E. Robertson,et al.  The TREC-9 filtering track , 1999, SIGF.

[9]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[10]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[11]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[12]  Norbert Fuhr,et al.  Retrieval Effectiveness of Proper Name Search Methods , 1996, Inf. Process. Manag..

[13]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[14]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[15]  Naohiro Ishii,et al.  Content-based Collaborative Information Filtering: Actively Learning to Classify and Recommend Documents , 1998, CIA.

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  Avi Arampatzis,et al.  Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering , 2000, TREC.

[18]  Gholamreza Nakhaeizadeh,et al.  Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues , 1997, ECML.

[19]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[20]  Jacques Savoy,et al.  Ranking Schemes in Hybrid Boolean Systems: A New Approach , 1997, J. Am. Soc. Inf. Sci..

[21]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[22]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[23]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[24]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[25]  Padmini Srinivasan,et al.  A cluster-based approach to broadcast news , 2002 .

[26]  Wai Lam,et al.  Modeling user interest shift using a Bayesian approach , 2001 .

[27]  Padmini Srinivasan,et al.  Filters, Webs and Answers: The University of Iowa TREC-8 Results , 1999, TREC.

[28]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[29]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.