Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic

We present a large-scale comparison of five multidisciplinary bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. The comparison considers all scientific documents from the period 2008-2017 covered by these data sources. Scopus is compared in a pairwise manner with each of the other data sources. We first analyze differences between the data sources in the coverage of documents, focusing for instance on differences over time, differences per document type, and differences per discipline. We then study differences in the completeness and accuracy of citation links. Based on our analysis, we discuss strengths and weaknesses of the different data sources. We emphasize the importance of combining a comprehensive coverage of the scientific literature with a flexible set of filters for making selections of the literature.

[1]  Stacy Konkiel,et al.  Dimensions: Bringing down barriers between scientometricians and data , 2020, Quantitative Science Studies.

[2]  Grégoire Côté,et al.  Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies , 2020, Quantitative Science Studies.

[3]  S. Rijcke,et al.  Bibliometrics: The Leiden Manifesto for research metrics. , 2015, Nature.

[4]  Nees Jan van Eck,et al.  Evaluation of the citation matching algorithms of CWTS and iFQ in comparison to the Web of science , 2015, J. Assoc. Inf. Sci. Technol..

[5]  Miguel A. García-Pérez,et al.  Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Enrique Orduña-Malea,et al.  Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison , 2018, Scientometrics.

[7]  Ludo Waltman,et al.  Accuracy of citation data in Web of Science and Scopus , 2019, ISSI.

[8]  Holly Else,et al.  How I scraped data from Google Scholar , 2018 .

[9]  Lutz Bornmann Field classification of publications in Dimensions: a first case study testing its reliability and validity , 2018, Scientometrics.

[10]  David A. Pendlebury,et al.  Web of Science as a data source for research on scientific and scholarly activity , 2020, Quantitative Science Studies.

[11]  Christian Herzog,et al.  Dimensions: Building Context for Search and Evaluation , 2018, Front. Res. Metr. Anal..

[12]  Silvio Peroni,et al.  Crowdsourcing open citations with CROCI - An analysis of the current status of open citations, and a proposal , 2019, ISSI.

[13]  Enrique Orduña-Malea,et al.  Dimensions: re-discovering the ecosystem of scientific information , 2018, ArXiv.

[14]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.

[15]  Michiel Schotten,et al.  A Brief History of Scopus: The World’s Largest Abstract and Citation Database of Scientific Literature , 2017 .

[16]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[17]  Mike Thelwall,et al.  Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories , 2018, J. Informetrics.

[18]  Yuxiao Dong,et al.  Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[19]  Dominika Tkaczyk,et al.  Crossref: The sustainable source of community-owned scholarly metadata , 2020, Quantitative Science Studies.

[20]  Mike Thelwall,et al.  Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations , 2021, Scientometrics.

[21]  Qi Wang,et al.  Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus , 2015, J. Informetrics.

[22]  Anne-Wil Harzing Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science? , 2019, Scientometrics.

[23]  Joshua D. Schnell Web of Science: The First Citation Index for Data Analytics and Scientometrics , 2017 .

[24]  Rafael Aleixandre-Benavent,et al.  A systematic analysis of duplicate records in Scopus , 2015, J. Informetrics.

[25]  Silvio Peroni,et al.  Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations , 2019, Scientometrics.

[26]  Henk F. Moed,et al.  Comparing bibliometric country-by-country rankings derived from the Web of Science and Scopus: the effect of poorly cited journals in oncology , 2009, J. Inf. Sci..

[27]  Silvio Peroni,et al.  OpenCitations, an infrastructure organization for open scholarship , 2019, Quantitative Science Studies.

[28]  Vincent Larivière,et al.  Special issue on bibliographic data sources , 2020, Quantitative Science Studies.

[29]  Christian Herzog,et al.  Response to the letter ‘Field classification of publications in Dimensions: a first case study testing its reliability and validity’ , 2018, Scientometrics.

[30]  Yuxiao Dong,et al.  A Review of Microsoft Academic Services for Science of Science Studies , 2019, Front. Big Data.