Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers

In many domains there are specific attributes in documents that carry more weight than the general words in the document. This paper proposes the use of information extraction techniques in order to identify these attributes for the domain of calls for papers. The utilisation of attributes into queries imposes new requirements on the retrieval method of conventional information retrieval systems. A new model for estimating the relevance of documents to user requests is also presented. The effectiveness of this model and the benefits of integrating information extraction with information retrieval are shown by comparing our system with a typical information retrieval system. The results show a precision increase of between 45% and 60% of all recall points.

[1]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[2]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[3]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[6]  Gerald Salton,et al.  Automatic text processing , 1988 .

[7]  E. F. Codd Understanding relations , 1974, SGMD.

[8]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[10]  Kenneth Ward Church,et al.  Commercial applications of natural language processing , 1995, CACM.

[11]  Padmini Srivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[12]  Paul Thompson Description of the PRC CEO Algorithm for TREC , 1992, TREC.

[13]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Edward A. Fox,et al.  Combining Evidence from Multiple Searches , 1992, TREC.

[16]  Jeffrey D. Ullman,et al.  Universal Relation Interfaces for Database Systems , 1983, IFIP Congress.

[17]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[18]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[19]  Robert J. Gaizauskas,et al.  Coupling information retrieval and information extraction: A new text technology for gathering information from the web , 1997, RIAO.

[20]  Naomi Sager,et al.  Natural Language Information Processing: A Computer Grammar of English and Its Applications , 1980 .

[21]  Gobinda G. Chowdhury,et al.  Automatic interpretation of the texts of chemical patent abstracts. 2. Processing and results , 1992, J. Chem. Inf. Comput. Sci..

[22]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[23]  E. F. Codd,et al.  Understanding relations , 1973, SGMD.