Intelligent information filtering via hybrid techniques: hill climbing, case-based reasoning, index patterns, and genetic algorithms

As the size of the Internet increases, the amount of data available to users has dramatically risen, resulting in an information verload for users. This work shows that information overload is a problem, and that data is organized poorly by existing browsers. To address these problems, an intelligent information news filtering system named INFOS (Intelligent News Filtering Organizational System) was created to reduce the user’s search burden by automatically eliminating Usenet news articles predicted to be irrelevant. These predictions are learned automatically by adapting an internal user model that is based upon features taken from articles and collaborative features derived from other users. The features are manipulated through keyword-based techniques, knowledge-based techniques, and genetic algorithms to build a user model to perform the actual filtering. The integration of knowledge-based techniques for in-depth analysis, statistical and keyword approaches for scalability, and genetic algorithms for exploration allows INFOS to achieve better filtering performance than by using either technique alone. Experimental results collected from the prototype of INFOS validate the gain in performance within the domain of news articles posted to electronic bulletin boards.

[1]  Janet L. Kolodner,et al.  Maintaining Organization in a Dynamic Long-Term Memory , 1983, Cogn. Sci..

[2]  Paul E. Baclace Competitive agents for information filtering , 1992, CACM.

[3]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[4]  Lawrence A. Gordon,et al.  Information overload: A temporal approach☆ , 1990 .

[5]  Sergio J. Alvarado Understanding Editorial Text: A Computer Model of Argument Comprehension , 1990 .

[6]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[7]  Marti A. Hearst Cases as Structured Indexes for Full-Length Documents , 1993 .

[8]  Robert M. Pap,et al.  Handbook of neural computing applications , 1990 .

[9]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[10]  Beerud Dilip Sheth,et al.  A learning approach to personalized information filtering , 1994 .

[11]  Vinod Kumar Thukral Cognitive strain as a cause of negative bias , 1983 .

[12]  Robert Kass,et al.  Modeling users' interests in information filters , 1992, CACM.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  J. C. Scholtes Unsupervised context learning in natural language processing , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[16]  William B. Rouse,et al.  On capturing human skills and knowledge; algorithmic approaches to model identification , 1989, IEEE Trans. Syst. Man Cybern..

[17]  Elaine Rich,et al.  User Modeling via Stereotypes , 1998, Cogn. Sci..

[18]  Mark Sanderson,et al.  Conceptual Information Retrieval – A Case Study in Adaptive Partial Parsing , 1992 .

[19]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[20]  Ashwin Ram Natural language understanding for information-filtering systems , 1992, CACM.

[21]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[22]  Ellen Riloff Using cases to represent context for text classification , 1993, CIKM '93.

[23]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[24]  Charles Eugene Martin,et al.  Direct memory access parsing , 1992 .

[25]  A.J. Maren,et al.  Neural networks for enhanced human-computer interactions , 1991, IEEE Control Systems.

[26]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[27]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[28]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[29]  Kenrick Mock A genetic classification system via discrimination tables , 1993 .

[30]  R. E. Eberts,et al.  Knowledge acquisition using neural networks for intelligent interface design , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[31]  Anthony F. Norcio,et al.  Adaptive interfaces: modeling tasks and users , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[32]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[33]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[34]  Janet L. Kolodner,et al.  Reconstructive Memory: A Computer Model , 1983, Cogn. Sci..

[35]  Ellen Riloff,et al.  Classifying Texts Using Relevancy Signatures , 1992, AAAI.

[36]  W. Bruce Croft,et al.  An Approach to Incorporating CBR Concepts in IR Systems , 1993 .

[37]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of filtering methods , 1992, CHI '92.

[38]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[39]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[40]  Upendra Shardanand Social information filtering for music recommendation , 1994 .

[41]  Frank Curtis Stevens,et al.  Knowledge-based assistance for accessing large, poorly structured information spaces , 1993 .

[42]  B. Clifford Neuman,et al.  A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[43]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[44]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[45]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[46]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[47]  Pattie Maes,et al.  Collaborative Interface Agents , 1994, AAAI.

[48]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[49]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[50]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[51]  Qiyang Chen,et al.  A neural network approach for user modeling , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[52]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[53]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[54]  Kenrick Mock,et al.  Comprehension and retrieval of failure cases in airborne observatories , 1995 .

[55]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[56]  Andrew Jennings,et al.  A Personal News Service Based on a User Model Neural Network , 1992 .

[57]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[58]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[59]  Mohamed Quafafou,et al.  GAITS: Fuzzy Set-Based Algorithms for Computing Strategies Using Genetic Algorithms , 1993, FLAI.

[60]  Hiyan Alshawi,et al.  Processing Dictionary Definitions with Phrasal Pattern Hierarchies , 1987, CL.