Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation

A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of hundreds of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, WEB documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most probable use would be for improving or refining search results.

[1]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[2]  William R. Hersh,et al.  An Evaluation of Interactive Boolean and Natural Language Searching with an Online Medical Textbook , 1995, J. Am. Soc. Inf. Sci..

[3]  Ralph Grishman,et al.  Analyzing language in restricted domains : sublanguage description and processing , 1986 .

[4]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[5]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[6]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[7]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[8]  Jessica L. Milstead,et al.  Methodologies for Subject Analysis in Bibliographic Databases , 1992, Inf. Process. Manag..

[9]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[10]  Barbara A. Norgard,et al.  An association-based method for automatic indexing with a controlled vocabulary , 1998 .

[11]  Elizabeth D. Liddy,et al.  Use of Subject Field Codes from a Machine-Readable Dictionary for Automatic Classification of Documents , 1992 .

[12]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[13]  Steven J. Marcus First line: ask the librarian , 1996 .

[14]  W. John Wilbur,et al.  The Effectiveness of Document Neighboring in Search Enhancement , 1994, Inf. Process. Manag..

[15]  Padmini Srinivasan,et al.  Optimal Document-Indexing Vocabulary for MEDLINE , 1996, Inf. Process. Manag..

[16]  Emil Levine Developing the World's Digital Collection on Peaceful Uses of Atomic Energy. , 1997 .

[17]  Randolph A. Miller,et al.  Research Paper: An Experiment Comparing Lexical and Statistical Methods for Extracting MeSH Terms from Clinical Free Text , 1998, J. Am. Medical Informatics Assoc..

[18]  V. Alberani,et al.  The use of grey literature in health sciences: a preliminary survey. , 1990, Bulletin of the Medical Library Association.

[19]  R. P. Channing Rodgers Automated retrieval from multiple disparate information sources: The World Wide Web and the NLM's sourcerer project , 1995 .

[20]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.