Information Filtering: Overview of Issues, Research and Systems

An abundant amount of information is created and delivered over electronic media. Users risk becoming overwhelmed by the flow of information, and they lack adequate tools to help them manage the situation. Information filtering (IF) is one of the methods that is rapidly evolving to manage large information flows. The aim of IF is to expose users to only information that is relevant to them. Many IF systems have been developed in recent years for various application domains. Some examples of filtering applications are: filters for search results on the internet that are employed in the Internet software, personal e-mail filters based on personal profiles, listservers or newsgroups filters for groups or individuals, browser filters that block non-valuable information, filters designed to give children access them only to suitable pages, filters for e-commerce applications that address products and promotions to potential customers only, and many more. The different systems use various methods, concepts, and techniques from diverse research areas like: Information Retrieval, Artificial Intelligence, or Behavioral Science. Various systems cover different scope, have divergent functionality, and various platforms. There are many systems of widely varying philosophies, but all share the goal of automatically directing the most valuable information to users in accordance with their User Model, and of helping them use their limited reading time most optimally. This paper clarifies the difference between IF systems and related systems, such as information retrieval (IR) systems, or Extraction systems. The paper defines a framework to classify IF systems according to several parameters, and illustrates the approach with commercial and academic systems. The paper describes the underlying concepts of IF systems and the techniques that are used to implement them. It discusses methods and measurements that are used for evaluation of IF systems and limitations of the current systems. In the conclusion we present research issues in the Information Filtering research arena, such as user modeling, evaluation standardization and integration with digital libraries and Web repositories.

[1]  Paul B. Kantor,et al.  Capturing human intelligence in the net , 2000, CACM.

[2]  Timothy W. Finin,et al.  The role of user models in cooperative interactive systems , 1989, Int. J. Intell. Syst..

[3]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[4]  Ariel J. Frank,et al.  Katsir: A Framework for Harvesting Digital Libraries on the Web , 2000, ECIS.

[5]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[6]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[7]  Annika Waern,et al.  Information services based on user profile communication , 1999 .

[8]  Anton Eliëns Principles of Object-Oriented Software Development , 1994 .

[9]  Marko Balabanovic,et al.  Exploring Versus Exploiting when Learning User Models for Text Recommendation , 2004, User Modeling and User-Adapted Interaction.

[10]  Mary Campione,et al.  The Java tutorial , 1996 .

[11]  P. Resnick,et al.  The Market for Evaluations , 1999 .

[12]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[13]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[14]  Dan Harkey,et al.  The Essential Client/Server Survival Guide, 2nd Edition , 1996 .

[15]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  Michael J. Pazzani,et al.  Adaptive Web site agents , 1999, AGENTS '99.

[18]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[19]  Mary Ellen Bates Electronic clipping services: a new life for SDIs , 1994 .

[20]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[21]  Peretz Shoval,et al.  Information Filtering: A New Two-Phase Model Using Stereotypic User Profiling , 2004, Journal of Intelligent Information Systems.

[22]  Keiichiro Hoashi,et al.  Experiments on the TREC-8 Filtering Track , 1999, TREC.

[23]  Douglas W. Oard,et al.  The State of the Art in Text Filtering , 1997, User Modeling and User-Adapted Interaction.

[24]  Robert M. Losee When information retrieval measures agree about the relative quality of document rankings , 2000 .

[25]  John Riedl,et al.  Ganging up on Information Overload , 1998, Computer.

[26]  V. Rao Vemuri,et al.  Information filtering via hill climbing, wordnet, and index patterns , 1997, Inf. Process. Manag..

[27]  Robert B. Allen,et al.  User Models: Theory, Method, and Practice , 1990, Int. J. Man Mach. Stud..

[28]  M. Angela Sasse,et al.  Successful multiparty audio communication over the Internet , 1998, CACM.

[29]  Dan Harkey,et al.  Essential client/server survival guide , 1994 .

[30]  Arbee L. P. Chen,et al.  Index structures of user profiles for efficient Web page filtering services , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[31]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[32]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[33]  Christine L Borgman Report on NSF Workshop on Digital Libraries, Xerox Palo Alto Research Center (PARC), December 9-10, 1992. , 1993 .

[34]  Ingrid Renz,et al.  Adaptive information filtering: detecting changes in text streams , 1999, CIKM '99.

[35]  Pattie Maes,et al.  Learning Interface Agents , 1993, AAAI.

[36]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[37]  Douglas W. Oard,et al.  Implicit Feedback for Recommender Systems , 1998 .

[38]  Bhavani Raskutti,et al.  A Feature-based Approach to Recommending Selections based on Past Preferences , 2004, User Modeling and User-Adapted Interaction.

[39]  Richard W. Vuduc,et al.  SWAMI (poster session): a framework for collaborative filtering algorithm development and evaluation , 2000, SIGIR '00.

[40]  Jude W. Shavlik,et al.  Learning users' interests by unobtrusively observing their normal behavior , 2000, IUI '00.

[41]  Robin Cohen,et al.  User modeling in the design of interactive interface agents , 1999 .

[42]  Mark W. Newman,et al.  SWAMI: a framework for collaborative filtering algorithm development and evaluation. , 2000, SIGIR 2000.

[43]  Peretz Shoval,et al.  Experimentation with an information filtering system that combines cognitive and sociological filtering integrated with user stereotypes , 1999, Decis. Support Syst..

[44]  Andrew Jennings,et al.  A Personal News Service Based on a User Model Neural Network , 1992 .

[45]  C. Lee Giles,et al.  A system for automatic personalized tracking of scientific literature on the Web , 1999, DL '99.

[46]  Chaomei Chen,et al.  Information Visualisation and Virtual Environments , 1999 .

[47]  Christine Michel,et al.  Diagnostic evaluation of a personalized filtering information retrieval system: methodology and experimental results , 2000 .

[48]  Alessandro Micarelli,et al.  A Hybrid Architecture for User-Adapted Information Filtering on the World Wide Web , 1997 .

[49]  Young-Woo Seo,et al.  A reinforcement learning agent for personalized information filtering , 2000, IUI '00.

[50]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[51]  Gerald Salton,et al.  Automatic text processing , 1988 .

[52]  Ana Maria de Carvalho Moura,et al.  A survey on metadata for describing and retrieving Internet resources , 1998, World Wide Web.

[53]  Jean-Luc Minel,et al.  Designing tasks of identification of complex linguistic patterns used for text semantic filtering , 2000 .

[54]  Javed Mostafa,et al.  Empirical evaluation of explicit versus implicit acquisition of user profiles in information filtering systems , 1999, DL '99.

[55]  C. Lee Giles,et al.  Discovering Relevant Scientific Literature on the Web , 2000, IEEE Intell. Syst..

[56]  David A. Hull The TREC-6 Filtering Track: Description and Analysis , 1997, TREC.

[57]  Robert M. Losee,et al.  Determining Information Retrieval and Filtering Performance without Experimentation , 1995, Inf. Process. Manag..

[58]  Tsvi Kuflik,et al.  Generation of user profiles for information filtering — research agenda (poster session) , 2000, SIGIR '00.

[59]  Kevin N. Gurney,et al.  An introduction to neural networks , 2018 .

[60]  Michael J. Pazzani,et al.  A hybrid user model for news story classification , 1999 .

[61]  Anne Morris,et al.  The problem of information overload in business organisations: a review of the literature , 2000, Int. J. Inf. Manag..

[62]  Jun'ichi Tatemura Virtual reviewers for collaborative exploration of movie reviews , 2000, IUI '00.

[63]  Beerud Dilip Sheth,et al.  A learning approach to personalized information filtering , 1994 .

[64]  Udi Manber,et al.  Experience with personalization of Yahoo! , 2000, CACM.

[65]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[66]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[67]  Giovanni Guida,et al.  User modeling in expert man-machine interfaces: a case study in intelligent information retrieval , 1990, IEEE Trans. Syst. Man Cybern..

[68]  Thomas W. Malone,et al.  Intelligent Information Sharing Systems , 1986 .

[69]  Tsukasa Hirashima,et al.  Information Filtering Using User's Context on Browsing in Hypertext , 2004, User Modeling and User-Adapted Interaction.

[70]  Leonard N. Foner,et al.  Yenta: a multi-agent, referral-based matchmaking system , 1997, AGENTS '97.

[71]  Barry Smyth,et al.  A personalized television listings service , 2000, CACM.

[72]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[73]  Licia Calvi,et al.  Proficiency-Adapted Information Browsing and Filtering in Hypermedia Educational Systems , 1997, User Modeling and User-Adapted Interaction.

[74]  Jarmo Laaksolahti,et al.  ConCall: edited and adaptive information filtering , 1998, IUI '99.

[75]  Avi Arampatzis,et al.  Term selection for filtering based on distribution of terms over time , 2000 .

[76]  Frantz Vichot,et al.  Using Learning-based Filters to Detect Rule-based Filtering Obsolescence , 2000, RIAO.

[77]  Alexandros Moukas Amalthaea Information Discovery and Filtering Using a Multiagent Evolving Ecosystem , 1997, Appl. Artif. Intell..

[78]  Michael J. Pazzani,et al.  A personal news agent that talks, learns and explains , 1999, AGENTS '99.

[79]  David A. Hull The TREC-7 Filtering Track: Description and Analysis , 1998, Text Retrieval Conference.

[80]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[81]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[82]  Andrew Jennings,et al.  A user model neural network for a personal news service , 1993, User Modeling and User-Adapted Interaction.

[83]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[84]  A Min Tjoa,et al.  Applying evolutionary algorithms to the problem of information filtering , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[85]  Louise T. Su An investigation to find appropriate measures for evaluating interactive information retrieval , 1991 .

[86]  Sima C. Newell User Models and Filtering Agents for Improved Internet Information Retrieval , 2004, User Modeling and User-Adapted Interaction.

[87]  Robert Godin,et al.  Combining Relevance Feedback and Genetic Algorithm in an Internet Information Filtering Engine , 2000, RIAO.

[88]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[89]  Peter W. Foltz Using latent semantic indexing for information filtering , 1990 .

[90]  Peretz Shoval,et al.  Stereotypes in Information Filtering Systems , 1997, Inf. Process. Manag..

[91]  Shelia Benko,et al.  Preparing for EPSS projects , 1997, CACM.

[92]  Giorgos Zacharia,et al.  Evolving a multi-agent information filtering solution in Amalthaea , 1997, AGENTS '97.

[93]  Ibrahim Cingil,et al.  A broader approach to personalization , 2000, CACM.

[94]  Eugene Volokh,et al.  Personalization and privacy , 2000, CACM.

[95]  Robert Kass,et al.  Modeling users' interests in information filters , 1992, CACM.

[96]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[97]  Ariel J. Frank,et al.  The parallel evolution of search engines and digital libraries: their convergence to the Mega-Portal , 2000, Proceedings 2000 Kyoto International Conference on Digital Libraries: Research and Practice.

[98]  Jun'ichi Tatemura Visual querying and explanation of recommendations from collaborative filtering systems , 1998, IUI '99.

[99]  Donald H. Kraft,et al.  Evaluation of information retrieval systems: A decision theory approach , 1978, J. Am. Soc. Inf. Sci..

[100]  Jim Binkley,et al.  Rama: An architecture for Internet information filtering , 2004, Journal of Intelligent Information Systems.

[101]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[102]  Werner Winiwarter,et al.  Adaptive Information Extraction from Online Messages , 1994, RIAO.

[103]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[104]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[105]  A.P.W. Eliëns Principles of Object-Oriented Software Development, 2nd Edition , 2000 .

[106]  Hunter McCleary Filtered information services: a revolutionary new product or a new marketing strategy? , 1994 .

[107]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[108]  Byoung-Tak Zhang,et al.  Text filtering by boosting naive Bayes classifiers , 2000, SIGIR '00.

[109]  Paul B. Kantor,et al.  The Information Quest: A Dynamic Model of User's Information Needs. , 1999 .

[110]  Bradley J. Rhodes,et al.  The wearable remembrance agent: A system for augmented memory , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[111]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.