An Integrated System for Filtering News and Managing Distributed Data

With the development and diiusion of the Internet worldwide connection, a large amount of information can be delivered to the users. To avoid their being overrowed by the incoming data, methods of information ltering are required. Thus, there is the problem of determining what information is relevant to the user and how this decision can be taken by a supporting system. Parametric and qualitative descriptors of user's interest must be generated. This paper presents two approaches. The rst concerns an information ltering system based on an adaptation of the generalized probabilistic model of information retrieval. The user proole is a vector of weighted terms which are learned from the relevance assessment values given by the user on the training set. Positive terms are considered relevant to the informative need of the user, negative ones irrelevant. The relevance values are interpreted as subjective probabilities and hence are mapped into the real interval 0; 1]. ProFile is a ltering system for the netnews which uses this model with a scale of 11 predeened values of relevance. ProFile allows the user to update on{line his proole and to check the discrepancy between his assessment and the prediction of relevance of the system. The second concerns the InfoAgent, a system for supporting users in retrieving data in distributed and heterogeneous archives and repositories. The architecture is based on the metaphor of the software agents and incorporates innovative hints from other elds: distributed architectures, relevance feedback and active interfaces. The system has a cooperative and supportive role: it understands the user's needs and learns from his behavior. Its aim is to disengage the user from learning complex tools and from performing tedious and repetitive actions.

[1]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[2]  W. Bruce Croft,et al.  Relevance feedback and inference networks , 1993, SIGIR.

[3]  Christoph G. Thomas BASAR: A Framework for Integrating Agents in the World Wide Web , 1995, Computer.

[4]  S. E. Robertson,et al.  On Relevance weight estimation and Query Expansion , 1986, J. Documentation.

[5]  Pattie Maes,et al.  Collaborative Interface Agents , 1994, AAAI.

[6]  Jaakko Hintikka,et al.  On Semantic Information , 1970 .

[7]  Amedeo Cesta,et al.  Building interfaces as personal agents: a case study , 1996, SGCH.

[8]  Oren Etzioni,et al.  A softbot-based interface to the Internet , 1994, CACM.

[9]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[10]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[11]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[12]  Hector Garcia-Molina,et al.  SIFT - a Tool for Wide-Area Information Dissemination , 1995, USENIX.

[13]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[14]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[15]  W. Bruce Croft,et al.  A Comparison of Text Retrieval Models , 1992, Comput. J..

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[18]  Les Gasser,et al.  AI on the WWW : Supply and Demand Agents , 1995, IEEE Expert.

[19]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[20]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..