Microsoft Academic Automatic Document Searches: Accuracy for Journal Articles and Suitability for Citation Analysis

Microsoft Academic is a free academic search engine and citation index that is similar to Google Scholar but can be automatically queried. Its data is potentially useful for bibliometric analysis if it is possible to search effectively for individual journal articles. This article compares different methods to find journal articles in its index by searching for a combination of title, authors, publication year and journal name and uses the results for the widest published correlation analysis of Microsoft Academic citation counts for journal articles so far. Based on 126,312 articles from 323 Scopus subfields in 2012, the optimal strategy to find articles with DOIs is to search for them by title and filter out those with incorrect DOIs. This finds 90% of journal articles. For articles without DOIs, the optimal strategy is to search for them by title and then filter out matches with dissimilar metadata. This finds 89% of journal articles, with an additional 1% incorrect matches. The remaining articles seem to be mainly not indexed by Microsoft Academic or indexed with a different language version of their title. From the matches, Scopus citation counts and Microsoft Academic counts have an average Spearman correlation of 0.95, with the lowest for any single field being 0.63. Thus, Microsoft Academic citation counts are almost universally equivalent to Scopus citation counts for articles that are not recent but there are national biases in the results.

[1]  Rodrigo Costas,et al.  Users, narcissism and control – tracking the impact of scholarly publications in the 21st century , 2012 .

[2]  Fiorenzo Franceschini,et al.  Do Scopus and WoS correct “old” omitted citations? , 2016, Scientometrics.

[3]  Bartosz Paszcza,et al.  Comparison of Microsoft Academic (Graph) with Web of Science, Scopus and Google Scholar , 2016 .

[4]  Mike Thelwall,et al.  Does Microsoft Academic find early citations? , 2017, Scientometrics.

[5]  Satu Alakangas,et al.  Microsoft Academic is one year old: the Phoenix is ready to leave the nest , 2017, Scientometrics.

[6]  Mike Thelwall,et al.  Geometric journal impact factors correcting for individual highly cited articles , 2015, J. Informetrics.

[7]  Mike Thelwall,et al.  The metric tide: report of the independent review of the role of metrics in research assessment and management , 2015 .

[8]  Fiorenzo Franceschini,et al.  A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics , 2013, J. Assoc. Inf. Sci. Technol..

[9]  Fiorenzo Franceschini,et al.  Errors in DOI indexing by bibliometric databases , 2015, Scientometrics.

[10]  Martin P. Brändle,et al.  Citation analysis with microsoft academic , 2016, Scientometrics.

[11]  Alan L. Mackay,et al.  Publish or perish , 1974, Nature.

[12]  Anne-Wil Harzing,et al.  Microsoft Academic (Search): a Phoenix arisen from the ashes? , 2016, Scientometrics.

[13]  Mike Thelwall,et al.  Three practical field normalised alternative indicator formulae for research evaluation , 2016, J. Informetrics.

[14]  Mike Thelwall,et al.  Evaluating altmetrics , 2013, Scientometrics.

[15]  Fiorenzo Franceschini,et al.  Empirical analysis and classification of database errors in Scopus and Web of Science , 2016, J. Informetrics.

[16]  MongeonPhilippe,et al.  The journal coverage of Web of Science and Scopus , 2016 .

[17]  Fiorenzo Franceschini,et al.  Influence of omitted citations on the bibliometric statistics of the major Manufacturing journals , 2015, Scientometrics.

[18]  Mike Thelwall,et al.  Interpreting correlations between citation counts and other indicators , 2016, Scientometrics.

[19]  Petr Knoth,et al.  An Analysis of the Microsoft Academic Graph , 2016, D Lib Mag..

[20]  Matthew E Falagas,et al.  Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[21]  Mike Thelwall,et al.  Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines , 2008, Scientometrics.

[22]  Fiorenzo Franceschini,et al.  Scientific journal publishers and omitted citations in bibliometric databases: Any relationship? , 2014, J. Informetrics.

[23]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[24]  Henk F. Moed,et al.  Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation - Review of the Literature , 2017, J. Informetrics.

[25]  Fiorenzo Franceschini,et al.  The museum of errors/horrors in Scopus , 2016, J. Informetrics.

[26]  Nicolás Robinson-García,et al.  The Google scholar experiment: How to index false papers and manipulate bibliometric indicators , 2013, J. Assoc. Inf. Sci. Technol..

[27]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.

[28]  Thed N. van Leeuwen,et al.  Towards a new crown indicator: Some theoretical considerations , 2010, J. Informetrics.

[29]  Satu Alakangas,et al.  Microsoft Academic: is the phoenix getting wings? , 2016, Scientometrics.

[30]  Lutz Bornmann,et al.  The number of linked references of publications in Microsoft Academic in comparison with the Web of Science , 2017, Scientometrics.

[31]  Mike Thelwall,et al.  Microsoft Academic: A multidisciplinary comparison of citation counts with Scopus and Mendeley for 29 journals , 2017, J. Informetrics.

[32]  Martin P. Brändle,et al.  The coverage of Microsoft Academic: analyzing the publication output of a university , 2017, Scientometrics.

[33]  Michel Zitt,et al.  The journal impact factor: angel, devil, or scapegoat? A comment on J.K. Vanclay’s article 2011 , 2012, Scientometrics.