A Natural Language Thesaurus for the Humanities: The Need for a Database Search Aid

Database searching presents special difficulties for humanists because many subjects may be covered, many synonyms may be used to describe a single concept, and terms may vary in precision. Databases may be searched by using controlled vocabularies, free-text (natural language) terms, or a combination of both. A significant cause of recall failure in a free-text search is the inability of the searcher to think of all the terms an author may have used. The current study was undertaken to determine the potential value to humanists of a thesaurus integrating free-text terms from the humanities and social sciences. In the first part of the study, a sample of common-noun subject headings from the "Humanities Index" was analyzed to determine how many have at least quasi-synonymous terms. The subject headings were compared to terms in "The Contemporary Thesaurus of Social Science Terms and Synonyms: A Guide for Natural Language Computer Searching" to determine the overlap of terminology between the humanities and social sciences. The results indicate a high degree of overlap, suggesting that a thesaurus integrating terms from the humanities and the social sciences would be of value to scholars in both disciplines. Results also demonstrate that a high proportion of common-noun subject headings have at least quasi-synonymous terms useful for searching. In the second part of the study, searches for humanities scholars were conducted on controlled-vocabulary databases, using both controlled vocabulary and free-text terms to determine whether the latter retrieved additional relevant records not retrieved by the controlled vocabulary. The results indicate that combining both approaches yields more relevant items and higher recall than either method alone. Searchers need tools to identify both controlled-vocabulary terms and free-text terms. The proposed free-text thesaurus will complement controlled-vocabulary thesauri.

[1]  Jerry R. Byrne Relative effectiveness of titles, abstracts, and subject headings for machine retrieval from the COMPENDEX services , 1975, J. Am. Soc. Inf. Sci..

[2]  Robert N. Broadus : The Humanities: A Selective Guide to Information Sources , 1975 .

[3]  R E Chesley,et al.  The Educational Resources Information Center , 1979, Exceptional children.

[4]  Pauline Atherton,et al.  An Analysis of Controlled Vocabulary and Free Text Search Statements in Online Searches , 1980 .

[5]  William H. Mischo Library of congress subject headings: A review of the problems, and prospects for improved subject access , 1982 .

[6]  Ernest Perez Text Enhancement: Controlled Vocabulary vs. Free Text. , 1982 .

[7]  Stephen E. Wiberley Subject Access in the Humanities and the Precision of the Humanist's Vocabulary , 1983, The Library Quarterly.

[8]  C. P. R. Dubois,et al.  Free text vs. controlled vocabulary; a reassessment , 1987 .

[9]  Rudolf Stephan,et al.  The New Harvard Dictionary of Music , 1988 .

[10]  Stephen E. Wiberley Names in Space and Time: The Indexing Vocabulary of the Humanities , 1988, The Library Quarterly.

[11]  J. Kristensen,et al.  The effectiveness of a searching thesaurus in free-text searching in a full-text database , 1990 .

[12]  Geraldene Walker Searching the Humanities: Subject Overlap and Search Vocabulary. , 1990 .

[13]  Raya Fidel Searchers' selection of search keys: I. The selection routine , 1991 .

[14]  Pm Warren,et al.  The Dictionary of Art , 1992 .

[15]  Sarah D. Knapp The Contemporary Thesaurus of Social Science Terms and Synonyms: A Guide for Natural Language Computer Searching , 1992 .

[16]  Susan Siegfried,et al.  An Analysis of Search Terminology Used by Humanities Scholars: The Getty Online Searching Project Report Number 1 , 1993, The Library Quarterly.

[17]  Jaana Kristensen,et al.  Expanding End-Users' Query Statements for Free Text Searching with a Search-Aid Thesaurus , 1993, Inf. Process. Manag..

[18]  Helen R. Tibbo Indexing for the Humanities , 1994, J. Am. Soc. Inf. Sci..

[19]  Jennifer E. Rowley,et al.  The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research , 1994, J. Inf. Sci..

[20]  Stephen E. Wiberley,et al.  Humanists Revisited: A Longitudinal Look at the Adoption of Information Technology , 1994 .

[21]  Joy Tillotson Is Keyword Searching the Answer , 1995 .