The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research

This article revisits the debate concerning con trolled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM. The debate was first formu lated in the early days of information retrieval more than a century ago but, despite significant advances in technology, remains unresolved. The article divides the history of the debate into four eras. Era one was characterised by the introduction of controlled vocabulary. Era two focused on comparisons between different indexing languages in order to assess which was best. Era three saw a number of case studies of limited generalisability and a general recognition that the best search performance can be achieved by the parallel use of the two types of indexing languages. The emphasis in Era four has been on the development of end-user-based systems, including online public access catalogues and databases on CD-ROM. Recent developments in the use of expert systems techniques to support the representation of meaning may lead to systems which offer significant support to the user in end-user searching. In the meantime, however, information retrieval in practice involves a mixture of natural and controlled indexing languages used to search a wide variety of different kinds of databases.

[1]  C. P. R. Dubois,et al.  Free text vs. controlled vocabulary; a reassessment , 1987 .

[2]  Shirley Anne Cousins,et al.  Enhancing subject Access to OPACs: Controlled vocabulary vs Natural Language , 1992, J. Documentation.

[3]  Michael D. Cooper,et al.  Failure time analysis of office system use , 1991, J. Am. Soc. Inf. Sci..

[4]  Barbara Charton Searching the Literature for Concepts , 1977, J. Chem. Inf. Comput. Sci..

[5]  Pauline Atherton,et al.  An Analysis of Controlled Vocabulary and Free Text Search Statements in Online Searches , 1980 .

[6]  K. Bhattacharyya,et al.  The Effectiveness of Natural Language in Science Indexing and Retrieval. , 1974 .

[7]  R. Basch The seven deadly sins of full-text searching , 1989 .

[8]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[9]  T. M. Aitchison,et al.  Comparative evaluation of index languages , 1969 .

[10]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[11]  Cyril W. Cleverdon A comparative evaluation of searching by controlled language and natural language in experimental N.A.S.A. data base , 1977 .

[12]  Bert R. Boyce,et al.  Hedge trimming and the resurrection of the controlled vocabulary in online searching , 1983 .

[13]  F. W. Lancaster,et al.  MEDLARS: Report on the Evaluation of Its Operating Efficiency. , 1997 .

[14]  Marcia J. Bates,et al.  How to use controlled vocabularies more effectively in online searching , 1988 .

[15]  R. L. Jones,et al.  STATUS with IQ—escaping from the Boolean straitjacket , 1988 .

[16]  Don R. Swanson The Cranfield Hypotheses [with Comment] , 1965, The Library Quarterly.

[17]  Elaine Svenonius Unanswered questions in the design of controlled vocabularies , 1986 .

[18]  Reva Basch My most difficult search , 1991 .

[19]  C.P.R. Dubois,et al.  The use of thesauri in online retrieval , 1984 .

[20]  Robert Fugmann The complementarity of natural and indexing languages , 1982 .

[21]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[22]  Mary L. Calkins Free Text or Controlled Vocabulary? A Case History Step-By-Step Analysis Plus Other Aspects of Search Strategy. , 1980 .

[23]  Martin Kurth,et al.  Controlled and Uncontrolled Vocabulary Subject Searching in an Academic Library Online Catalog. , 1991 .

[24]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project: report on the testing and analysis of an investigation into the comparative efficiency of indexing systems , 1962 .

[25]  Jung Soon Ro An evaluation of the applicability of ranking algorithms to improve the effectiveness of full‐text retrieval. I. On the effectiveness of full‐text retrieval , 1988 .

[26]  Rolf G. Henzler,et al.  Free or controlled vocabularies , 1978 .

[27]  Gerard Salton,et al.  A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) , 1972, J. Am. Soc. Inf. Sci..

[28]  Padmini Srivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[29]  Richard M. Tong,et al.  A knowledge representation for conceptual information retrieval , 1989, Int. J. Intell. Syst..

[30]  Bert R. Boyce,et al.  Entry point depth and online search using a controlled vocabulary , 1989, JASIS.

[31]  F. W. Lancaster,et al.  Evaluating the effectiveness of an on-line, natural language retrieval system , 1972, Inf. Storage Retr..

[32]  Raya Fidel Who Needs Controlled Vocabulary , 1992 .

[33]  Ernest Perez Text Enhancement: Controlled Vocabulary vs. Free Text. , 1982 .

[34]  Carol Tenopir,et al.  Full text database retrieval performance , 1985 .

[35]  D. R. Swanson The Evidence Underlying the Cranfield Results , 1965, The Library Quarterly.

[36]  E. M. Keen,et al.  Report of an information science index languages test , 1972 .