Identifying a better measure of relatedness for mapping science

Measuring the relatedness between bibliometric units (journals, documents, authors or words) is a central task in bibliometric analysis. Relatedness measures are used for many different tasks, among them the generating of maps, or visual pictures, showing the relationship between all items from these data. Despite the importance of these tasks, there has been little written on how to quantitatively evaluate the accuracy of relatedness measures or the resulting maps. We propose a new framework for assessing the performance of relatedness measures and visualization algorithms that contains four factors: accuracy, coverage, scalability, and robustness. This method was applied to ten measures of journal-journal relatedness to determine the best measure. The ten relatedness measures were then used as inputs to a visualization algorithm to create an additional ten measures of journal-journal relatedness based on the distances between pairs of journals in two-dimensional space. This second step allows us to determine robustness (i.e., which measure remains best after dimension reduction). Results show that, for low coverage (under 50%), the Pearson correlation is the most accurate raw relatedness measure. However, the best overall measure, both at high coverage, and after dimension reduction, is the cosine index or a modified cosine index. Results also showed that the visualization algorithm increased local accuracy for most measures. Possible reasons for this counterintuitive finding are discussed.

[1]  Robert J. W. Tijssen,et al.  A scientometric cognitive study of neural network research: Expert mental maps versus bibliometric maps , 1993, Scientometrics.

[2]  Katherine W. McCain,et al.  Neural networks research in context: A longitudinal journal cocitation analysis of an emerging interdisciplinary field , 1998, Scientometrics.

[3]  K. McCain Mapping Economics through the Journal Literature: An Experiment in Journal Cocitation Analysis. , 1991 .

[4]  Henry G. Small,et al.  Update on science mapping: Creating large document spaces , 1997, Scientometrics.

[5]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[6]  Henry Small Visualizing science by citation mapping , 1999 .

[7]  Jesper W. Schneider,et al.  Mapping scientific frontiers: The quest for knowledge visualization , 2004, J. Assoc. Inf. Sci. Technol..

[8]  Loet Leydesdorff,et al.  Clusters and Maps of Science Journals Based on Bi-connected Graphs in the Journal Citation Reports , 2009, ArXiv.

[9]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[10]  B.G. Celler,et al.  Selecting a neural network structure for ECG diagnosis , 1998, Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol.20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No.98CH36286).

[11]  Michael McGill,et al.  An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. , 1979 .

[12]  Matthias Winterhager,et al.  Mapping interdisciplinary research fronts in neuroscience: A bibliometric view to retrograde amnesia , 2004, Scientometrics.

[13]  Katherine W. McCain,et al.  Cocited author mapping as a valid representation of intellectual structure , 1986, J. Am. Soc. Inf. Sci..

[14]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[15]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[16]  Markus Gmür,et al.  Co-citation analysis and the search for invisible colleges: A methodological evaluation , 2004, Scientometrics.

[17]  Katherine W. McCain Core journal networks and cocitation maps in the marine sciences: tools and information management in interdisciplinary research , 1992 .

[18]  Ronald E. Rice,et al.  Scholarly communication in developmental dyslexia: influence of network structure on change in a hybrid problem area , 1998 .

[19]  Gobinda G. Chowdhury,et al.  Journal as Markers of Intellectual Space: Journal Co-Citation Analysis of Information Retrieval Area, 1987–1997 , 2004, Scientometrics.

[20]  Alexander I. Pudovkin,et al.  Indices of journal citation relatedness and citation relationships among aquatic biology journals , 2005, Scientometrics.

[21]  Alexander I. Pudovkin,et al.  Algorithmic procedure for finding semantically related journals , 2002, J. Assoc. Inf. Sci. Technol..

[22]  Loet Leydesdorff,et al.  Top-down decomposition of the Journal Citation Reportof the Social Science Citation Index: Graph- and factor-analytical approaches , 2004, Scientometrics.

[23]  Michel Zitt,et al.  Indicators in a research institute: A multi-level classification of scientific journals , 1999, Scientometrics.

[24]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[25]  Henry G. Small,et al.  Clustering thescience citation index® using co-citations , 1985, Scientometrics.

[26]  Kevin W. Boyack,et al.  Domain visualization using VxInsight® for science and technology management , 2002, J. Assoc. Inf. Sci. Technol..

[27]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[28]  Kevin W. Boyack,et al.  Cluster stability and the use of noise in interpretation of clustering , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[29]  Ronald Rousseau,et al.  Author cocitation analysis and Pearson's r , 2004, J. Assoc. Inf. Sci. Technol..

[30]  Henry G. Small,et al.  Clustering the science citation index using co-citations. II. Mapping science , 1985, Scientometrics.

[31]  Thed N. van Leeuwen,et al.  On generalising scientometric journal mapping beyond ISI's journal and citation databases , 1995, Scientometrics.

[32]  Loet Leydesdorff,et al.  Co-words and citations relations between document sets and environments , 1988 .

[33]  Chaomei Chen,et al.  Visualizing knowledge domains , 2005, Annu. Rev. Inf. Sci. Technol..

[34]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[35]  K. McCain,et al.  The structure of medical informatics journal literature. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[36]  Timothy Cribbin,et al.  Visualizing and tracking the growth of competing paradigms: Two case studies , 2002, J. Assoc. Inf. Sci. Technol..

[37]  Isabel Gómez,et al.  Interdisciplinarity in science: A tentative typology of disciplines and research areas , 2003, J. Assoc. Inf. Sci. Technol..

[38]  Loet Leydesdorff,et al.  Indicators of structural change in the dynamics of science: Entropy statistics of the SCI Journal Citation Reports , 2009, Scientometrics.

[39]  Hong Xu,et al.  Journal co-citation analysis of semiconductor literature , 2003, Scientometrics.