Towards Nootropia: a non-linear approach to adaptive document filtering

In recent years, it has become increasingly difficult for users to find relevant information within the accessible glut. Research in Information Filtering (IF) tackles this problem through a tailored representation of the user interests, a user profile. Traditionally, IF inherits techniques from the related and more well established domains of Information Retrieval and Text Categorisation. These include, linear profile representations that exclude term dependencies and may only effectively represent a single topic of interest, and linear learning algorithms that achieve a steady profile adaptation pace. We argue that these practices are not attuned to the dynamic nature of user interests. A user may be interested in more than one topic in parallel, and both frequent variations and occasional radical changes of interests are inevitable over time. With our experimental system "Nootropia", we achieve adaptive document filtering with a single, multi-topic user profile. A hierarchical term network that takes into account topical and lexical correlations between terms and identifies topic-subtopic relations between them, is used to represent a user's multiple topics of interest and distinguish between them. A series of non-linear document evaluation functions is then established on the hierarchical network. Experiments using a variation of TREC's routing subtask to test the ability of a single profile to represent two and three topics of interest, reveal the approach's superiority over a linear profile representation. Adaptation of this single, multi-topic profile to a variety of changes in the user interests, is achieved through a process of self-organisation that constantly readjusts the profile stucturally, in response to user feedback. We used virtual users and another variation of TREC's routing subtask to test the profile on two learning and two forgetting tasks. The results clearly indicate the profile's ability to adapt to both frequent variations and radical changes in user interests.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  James Rucker,et al.  Siteseer: personalized navigation for the Web , 1997, CACM.

[3]  Norbert Fuhr,et al.  Probabilistic search term weighting - some negative results , 1987, SIGIR '87.

[4]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Gary Marchionini,et al.  A Conceptual Framework for Text Filtering , 1996 .

[6]  Bruce R. Schatz,et al.  Automatic subject indexing using an associative neural network , 1998, DL '98.

[7]  Gianni Amati,et al.  A Framework for Filtering News and Managing Distributed Data , 1997, J. Univers. Comput. Sci..

[8]  Lauren B. Doyle,et al.  Semantic Road Maps for Literature Searchers , 1961, JACM.

[9]  P. Maes,et al.  Amalthaea and Histos: MultiAgent Systems for WWW Sites and Reputation Recommendations , 1999 .

[10]  Pattie Maes,et al.  Amalthaea: An Evolving Multi-Agent Information Filtering and Discovery System for the WWW , 2004, Autonomous Agents and Multi-Agent Systems.

[11]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[12]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[13]  A Min Tjoa,et al.  Applying evolutionary algorithms to the problem of information filtering , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[14]  Douglas W. Oard,et al.  The State of the Art in Text Filtering , 1997, User Modeling and User-Adapted Interaction.

[15]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[16]  Yoram Singer,et al.  BoosTexter: A System for Multiclass Multi-label Text Categorization , 1998 .

[17]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[18]  Matthias Klusch,et al.  Intelligent Information Agents: Agent-Based Information Discovery and Management on the Internet , 1999 .

[19]  Daniel R. Tauritz,et al.  Adaptive Information Filtering using Evolutionary Computation , 2000, Inf. Sci..

[20]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[21]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[22]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[23]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[24]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[25]  Marie-Francine Moens,et al.  Text categorization: the assignment of subject descriptors to magazine articles , 2000, Inf. Process. Manag..

[26]  Kenneth Ward Church One term or two? , 1995, SIGIR '95.

[27]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[28]  Krista Lagus,et al.  Text Retrieval Using Self-Organized Document Maps , 2002, Neural Processing Letters.

[29]  Chrystopher L. Nehaniv,et al.  The essence of embodiment: A framework for understanding and exploiting structural coupling between system and environment , 2001 .

[30]  John Mingers,et al.  Embodying information systems: the contribution of phenomenology , 2001, Inf. Organ..

[31]  Paul Bourgine,et al.  Autopoiesis and Cognition , 2004, Artificial Life.

[32]  James C. French,et al.  On the update of term weights in dynamic information retrieval systems , 1995, CIKM '95.

[33]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[34]  Geoffrey I. Webb,et al.  Using Decision Trees for Agent Modeling: Improving Prediction Performance , 2004, User Modeling and User-Adapted Interaction.

[35]  Filippo Menczer,et al.  Scalable Web Search by Adaptive Online Agents: An InfoSpiders Case Study , 1999 .

[36]  Ross Wilkinson,et al.  Using the cosine measure in a neural network for document retrieval , 1991, SIGIR '91.

[37]  Robert Godin,et al.  Combining Relevance Feedback and Genetic Algorithm in an Internet Information Filtering Engine , 2000, RIAO.

[38]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[39]  John Yen,et al.  Alipes: A Swift Messenger in Cyberspace , 1999 .

[40]  C. Lee Giles,et al.  Self-adaptive user profiles for large-scale data delivery , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[41]  Christine Michel,et al.  Diagnostic evaluation of a personalized filtering information retrieval system: methodology and experimental results , 2000 .

[42]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[43]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[44]  Filippo Menczer,et al.  Adaptive information agents in distributed textual environments , 1998, AGENTS '98.

[45]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[46]  James P. Callan Learning while filtering documents , 1998, SIGIR '98.

[47]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[48]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[49]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[50]  Curt Stevens,et al.  Automating the creation of information filters , 1992, CACM.

[51]  Marko Balabanovic,et al.  Exploring Versus Exploiting when Learning User Models for Text Recommendation , 2004, User Modeling and User-Adapted Interaction.

[52]  Natalie S. Glance,et al.  Knowledge Pump: Supporting the Flow and Use of Knowledge , 1998 .

[53]  Mohand Boughanem,et al.  Mercure and MercureFiltre Applied for Web and Filtering Tasks at TREC-10 , 2001, TREC.

[54]  Sethuraman Panchanathan,et al.  Review of Image and Video Indexing Techniques , 1997, J. Vis. Commun. Image Represent..

[55]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[56]  Key-Sun Choi,et al.  Automatic Thesaurus Construction Using Bayesian Networks , 1996, Inf. Process. Manag..

[57]  Ingrid Renz,et al.  Adaptive information filtering: detecting changes in text streams , 1999, CIKM '99.

[58]  Sanjiv K. Bhatia,et al.  Selection of search terms based on user profile , 1992, SAC '92.

[59]  Werner Winiwarter,et al.  PEA - a Personal Email Assistant with Evolutionary Adaptation , 1999 .

[60]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[61]  C. D. Batty The Automatic Generation of Index Languages , 1969 .

[62]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[63]  Andrew Jennings,et al.  A Personal News Service Based on a User Model Neural Network , 1992 .

[64]  Martin E M Uller Machine Learning Based User Modeling for Www Search , 1999 .

[65]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[66]  Tong Zhang,et al.  Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing , 2001 .

[67]  Geoffrey I. Webb,et al.  Comparative evaluation of alternative induction engines for Feature Based Modelling , 1997 .

[68]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[69]  Clement T. Yu,et al.  Term Weighting in Information Retrieval Using the Term Precision Model , 1982, JACM.

[70]  Andrew McCallum,et al.  Pool-Based Active Learning for Text Classification , 1999 .

[71]  Michael J. Pazzani,et al.  A hybrid user model for news story classification , 1999 .

[72]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[73]  Paul E. Baclace Competitive agents for information filtering , 1992, CACM.

[74]  Donna K. Harman,et al.  An experimental study of factors important in document ranking , 1986, SIGIR '86.

[75]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[76]  Fabio Crestani,et al.  Probabilistic Learning for Information Filtering , 1997, RIAO.

[77]  S. Robertson The probability ranking principle in IR , 1997 .

[78]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[79]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[80]  Alexander Pretschner,et al.  Personalization on the Web , 1999 .

[81]  Remo Pareschi,et al.  Information Technology for Knowledge Management , 1998, Springer Berlin Heidelberg.

[82]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[83]  Catriona Lewis,et al.  Understanding Medical Immunology , 1984 .

[84]  Beerud Dilip Sheth,et al.  A learning approach to personalized information filtering , 1994 .

[85]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[86]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[87]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[88]  N. J. Davies,et al.  Knowledge Sharing Agents Over the World Wide Web , 1998 .

[89]  Tomonari Kamba,et al.  ANATAGONOMY: a personalized newspaper on the World Wide Web , 1997, Int. J. Hum. Comput. Stud..

[90]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[91]  John Yen,et al.  An adaptive algorithm for learning changes in user interests , 1999, CIKM '99.

[92]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing , 1975, J. Am. Soc. Inf. Sci..

[93]  K. J. Lynch,et al.  Generating, integrating, and activating thesauri for concept-based document retrieval , 1993, IEEE Expert.

[94]  Bruce Krulwich,et al.  The InfoFinder Agent: Learning User Interests through Heuristic Phrase Extraction , 1997, IEEE Expert.

[95]  Norbert Fuhr,et al.  Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions , 1994, TOIS.

[96]  Kui-Lam Kwok A neural network for probabilistic information retrieval , 1989, SIGIR '89.

[97]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[98]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[99]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[100]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[101]  Richard K. Belew,et al.  Exporting phrases: a statistical analysis of topical language , 1991 .

[102]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[103]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[104]  Dunja Mladenic Using Text Learning to help Web browsing , 2001 .

[105]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[106]  Don R. Swanson,et al.  A decision theoretic foundation for indexing , 1975, J. Am. Soc. Inf. Sci..

[107]  Yiming Yang,et al.  Using Corpus Statistics to Remove Redundant Words in Text Categorization , 1996, J. Am. Soc. Inf. Sci..

[108]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[109]  Clement T. Yu,et al.  Effective information retrieval using term accuracy , 1977, CACM.

[110]  Michael J. Pazzani,et al.  A personal news agent that talks, learns and explains , 1999, AGENTS '99.

[111]  David A. Hull The TREC-7 Filtering Track: Description and Analysis , 1998, Text Retrieval Conference.

[112]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[113]  Henry Kautz,et al.  Combining social networks and collaborative ?ltering , 1997 .

[114]  Fredric C. Gey,et al.  Experiments in the Probabilistic Retrieval of Full Text Documents , 1994, TREC.

[115]  Shoshana Loeb,et al.  Architecting personalized delivery of multimedia information , 1992, CACM.

[116]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[117]  Fredrik Kilander A Brief Comparison of News Filtering Software , 1995 .

[118]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[119]  Richard K. Belew,et al.  Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents , 1989, SIGIR '89.

[120]  Robert M. Losee,et al.  Minimizing information overload: the ranking of electronic messages , 1989, J. Inf. Sci..

[121]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[122]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[123]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[124]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[125]  Padmini Srinivasan,et al.  Adaptive Filtering of Newswire Stories using Two-Level Clustering , 2002, Information Retrieval.

[126]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[127]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[128]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[129]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[130]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[131]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[132]  Colm O'Riordan,et al.  Profiling with the INFOrmer Text Filtering Agent , 1997, J. Univers. Comput. Sci..

[133]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[134]  Hans-Peter Kriegel,et al.  State-of-the-Art in Content-Based Image and Video Retrieval , 2001, Computational Imaging and Vision.

[135]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[136]  Thomas W. Malone,et al.  Intelligent Information Sharing Systems , 1986 .

[137]  Filippo Menczer,et al.  ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery , 1997, ICML 1997.

[138]  Hector Garcia-Molina,et al.  SIFT - a Tool for Wide-Area Information Dissemination , 1995, USENIX.

[139]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[140]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[141]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[142]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[143]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[144]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[145]  NgHwee Tou,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997 .

[146]  Key-Sun Choi,et al.  Lexical Concept Acquisition from Collocation Map , 1993, ACL 1993.

[147]  Clement T. Yu,et al.  The measurement of term importance in automatic indexing , 1981, J. Am. Soc. Inf. Sci..

[148]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[149]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[150]  F. Capra The Web of Life , 1996 .

[151]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[152]  John Yen,et al.  Learning user interest dynamics with a three-descriptor representation , 2001, J. Assoc. Inf. Sci. Technol..

[153]  Key-Sun Choi,et al.  Automatic thesaurus construction using Bayesian networks , 1995, CIKM '95.

[154]  Javed Mostafa,et al.  Detection of shifts in user interests for personalized information filtering , 1996, SIGIR '96.

[155]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[156]  K. Eric Drexler,et al.  Markets and computation: agoric open systems , 1988 .

[157]  H. P. Edmundson,et al.  Automatic abstracting and indexing—survey and recommendations , 1961, CACM.

[158]  Stefano Mizzaro,et al.  How many relevances in information retrieval? , 1998, Interact. Comput..

[159]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[160]  Young-Woo Seo,et al.  A reinforcement learning agent for personalized information filtering , 2000, IUI '00.

[161]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[162]  Peter J. Denning,et al.  ACM president's letter: electronic junk , 1982, CACM.

[163]  W. Bruce Croft,et al.  Discovering and Comparing Topic Hierarchies , 2000, RIAO.

[164]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[165]  Kostas Tzeras,et al.  Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[166]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[167]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[168]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[169]  Gerard Salton,et al.  Recent Studies in Automatic Text Analysis and Document Retrieval , 1973, JACM.

[170]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[171]  Ian H. Witten,et al.  Lexically-generated subject hierarchies for browsing large collections , 1999, International Journal on Digital Libraries.

[172]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[173]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[174]  Mark Claypool,et al.  Combining Content-Based and Collaborative Filters in an Online Newspaper , 1999, SIGIR 1999.

[175]  Uwe Aickelin,et al.  An Artificial Immune System as a Recommender for Web Sites , 2002 .

[176]  D. Watts The “New” Science of Networks , 2004 .

[177]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[178]  MladenicDunja Text-Learning and Related Intelligent Agents , 1999 .

[179]  WermterStefan Neural Network Agents for Learning Semantic Text Classification , 2000 .

[180]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[181]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[182]  Joel L. Fagan,et al.  Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 1987, SIGIR.

[183]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..