Semantic similarities between a keyword database and a controlled vocabulary database: an investigation in the antibiotic resistance literature

The KeyWords Plus in the Science Citation Index database represents an approach to combining citation and semantic indexing in describing the document content. This paper explores the similarities or dissimilarities between citation‐semantic and analytic indexing. The dataset consisted of over 400 matching records in the SCI and MEDLINE databases on antibiotic resistance in pneumonia. The degree of similarity in indexing terms was found to vary on a scale from completely different to completely identical with various levels in between. The within‐document similarity in the two databases was measured by a variation on the Jaccard Coefficient—the Inclusion Index. The average inclusion coefficient was 0.4134 for SCI and 0.3371 for MEDLINE. The 20 terms occurring most frequently in each database were identified. The two groups of terms shared the same terms that consist of the “intellectual base” for the subject. Conceptual similarity was analyzed through scatterplots of matching and nonmatching terms vs. partially identical and broader/narrower terms. The study also found that both databases differed in assigning terms in various semantic categories. Implications of this research and further studies are suggested.

[1]  G. Furnas,et al.  Pictures of relevance: a geometric analysis of similarity measures , 1987 .

[2]  Katherine W. McCain,et al.  Descriptor and citation retrieval in the medical behavioral sciences literature:retrieval overlaps and novelty distribution , 1989 .

[3]  Arlene G. Taylor,et al.  Indexing Overlap and Consistency between the "Avery Index to Architectural Periodicals" and the "Architectural Periodicals Index.". , 1993 .

[4]  Dagobert Soergel Indexing and retrieval performance: the logical evidence , 1994 .

[5]  Erik J. Biever,et al.  Indexing Consistency: The Input/Output Function of Thesauri , 1991 .

[6]  A. F. J. Van Raan,et al.  Cognitive Resemblance and Citation Relations in Chemical Engineering Publications. , 1995 .

[7]  Katherine W. McCain,et al.  Comparing retrieval performance in online data bases , 1987, Information Processing & Management.

[8]  Claire David,et al.  Inedxing as Problem Solving: A Cognitive Approach to Consistency , 2013 .

[9]  Susanne M. Humphrey,et al.  Knowledge-based indexing of the medical literature: the indexing aid project , 1987 .

[10]  G. Cottrell,et al.  Optimizing Similarity Using Multi-Query Relevance Feedback , 1998, J. Am. Soc. Inf. Sci..

[11]  W R Hersh,et al.  A comparison of retrieval effectiveness for three methods of indexing medical literature. , 1992, The American journal of the medical sciences.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Pauline Atherton,et al.  An Analysis of Controlled Vocabulary and Free Text Search Statements in Online Searches , 1980 .

[14]  Ross J. Todd Academic indexing: what’s it all about? , 1992, The Indexer: The International Journal of Indexing.

[15]  Garrison W. Cottrell,et al.  Representing documents using an explicit model of their similarities , 1995 .

[16]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[17]  Henk F. Moed,et al.  Mapping of Science by Combined Co-Citation and Word Analysis. I. Structural Aspects , 1991 .

[18]  Il-Yeol Song,et al.  Visual interactions with Web database content , 1998, NPIV '98.

[19]  R Fugmann,et al.  The five-axiom theory of indexing and information supply , 1985, J. Am. Soc. Inf. Sci..

[20]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[21]  Jennifer E. Rowley,et al.  The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research , 1994, J. Inf. Sci..

[22]  Padmini Srinivasan,et al.  Optimal Document-Indexing Vocabulary for MEDLINE , 1996, Inf. Process. Manag..

[23]  Carol Tenopir,et al.  Full text database retrieval performance , 1985 .

[24]  J. Travis,et al.  Reviving the antibiotic miracle? , 1994, Science.

[25]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[26]  Mary L. Calkins Free Text or Controlled Vocabulary? A Case History Step-By-Step Analysis Plus Other Aspects of Search Strategy. , 1980 .

[27]  Howard D. White,et al.  Quality of indexing in online data bases , 1987, Inf. Process. Manag..

[28]  M E Funk,et al.  Indexing consistency in MEDLINE. , 1983, Bulletin of the Medical Library Association.

[29]  M. Iivonen,et al.  Interindexer consistency and the indexing environment , 1990 .

[30]  Mirja Iivonen,et al.  Consistency in the Selection of Search Concepts and Search Terms , 1995, Inf. Process. Manag..

[31]  William R. Hersh,et al.  Information Retrieval in Medicine: The SAPHIRE Experience , 1995 .

[32]  Edward A. Fox,et al.  Advanced feedback methods in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[33]  F. W. Lancaster,et al.  Indexing and abstracting in theory and practice , 1991 .

[34]  Miranda Lee Pao,et al.  Retrieval effectiveness by semantic and citation searching , 1989, JASIS.

[35]  Eugene Garfield,et al.  KeyWords Plus™—algorithmic derivative indexing , 1993 .