An automated system that assists in the generation of document indexes

In this article, we describe AIMS (Assisted Indexing at Mississippi State), a system intended to aid human document analysts in the assignment of indexes to physical chemistry journal articles. The two major components of AIMS are a natural language processing (NLP) component and an index generation (IG) component. We provide an overview of what each of these components does and how it works. We also present the results of a recent evaluation of our system in terms of recall and precision. The recall rate is the proportion of the ‘correct’ indexes (i.e. those produced by human document analysts) generated by AIMS. The precision rate is the proportion of the generated indexes that is correct. Finally, we describe some of the future work planned for this project.

[1]  Julia E. Hodges,et al.  A Knowledge-Based Approach to Indexing Scientific Text , 1994, HLT.

[2]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[3]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[4]  Rajeev Agarwal,et al.  A Simple but Useful Approach to Conjunct Identification , 1992, ACL.

[5]  Nicoletta Calzolari,et al.  Review of Medical language processing: computer management of narrative data by Naomi Sager, Carol Friedman, and Margaret S. Lyman. Addison-Wesley 1987. , 1989 .

[6]  Rajeev Agarwal,et al.  Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text , 1991, AAAI.

[7]  Rajeev Agarwal,et al.  Semantic feature extraction from technical texts with limited human intervention , 1995 .

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Mary Hart,et al.  Automatic indexing using selective NLP and first-order thesauri , 1991, RIAO.

[10]  Julia E. Hodges,et al.  The automatic initialization of an object-oriented knowledge base , 1992, ACM-SE 30.

[11]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[12]  Karen Spärck Jones,et al.  Natural language processing for information retrieval , 1996, CACM.

[13]  Allen Ginsberg,et al.  A unified approach to automatic indexing and information retrieval , 1993, IEEE Expert.

[14]  Kui-Lam Kwok An interpretation of index term weighting schemes based on document components , 1986, SIGIR '86.

[15]  James Pustejovsky,et al.  Corpus processing for lexical acquisition , 1996 .

[16]  Kui-Lam Kwok,et al.  Learning from Relevant Documents in Large Scale Routing Retrieval , 1994, HLT.

[17]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[18]  Julia E. Hodges,et al.  Automated knowledge derivation: Domain‐independent techniques for domain‐restricted text sources , 1995, Int. J. Intell. Syst..

[19]  Wendy G. Lehnert,et al.  Wrap-Up: a Trainable Discourse Module for Information Extraction , 1994, J. Artif. Intell. Res..

[20]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[21]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[23]  Julia E. Hodges,et al.  Automatically building a knowledge base through natural language text analysis , 1991, Int. J. Intell. Syst..