The coverage of Microsoft Academic: analyzing the publication output of a university

This is the first detailed study on the coverage of Microsoft Academic (MA). Based on the complete and verified publication list of a university, the coverage of MA was assessed and compared with two benchmark databases, Scopus and Web of Science (WoS), on the level of individual publications. Citation counts were analyzed, and issues related to data retrieval and data quality were examined. A Perl script was written to retrieve metadata from MA based on publication titles. The script is freely available on GitHub. We find that MA covers journal articles, working papers, and conference items to a substantial extent and indexes more document types than the benchmark databases (e.g., working papers, dissertations). MA clearly surpasses Scopus and WoS in covering book-related document types and conference items but falls slightly behind Scopus in journal articles. The coverage of MA is favorable for evaluative bibliometrics in most research fields, including economics/business, computer/information sciences, and mathematics. However, MA shows biases similar to Scopus and WoS with regard to the coverage of the humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. We find that the publication year is correct for 89.5% of all publications and the number of authors is correct for 95.1% of the journal articles. Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA metadata are still lacking.

[1]  Li Zhang,et al.  The Impact of Data Source on the Ranking of Computer Scientists Based on Citation Indicators: A Comparison of Web of Science and Scopus. , 2014, Issues in Science and Technology Librarianship.

[2]  Dana L. Roth,et al.  Chemical Information for Chemists , 2013 .

[3]  Roland H. C. Yap,et al.  Analysing Trends in Computer Science Research: A Preliminary Study Using The Microsoft Academic Graph , 2017, WWW.

[4]  Hans-Dieter Daniel,et al.  Research assessment in the humanities: Towards criteria and procedures , 2016 .

[5]  Rodrygo L. T. Santos,et al.  Simplified Relative Citation Ratio for Static Paper Ranking: UFMG/LATIN at WSDM Cup 2016 , 2016, ArXiv.

[6]  Thed N. van Leeuwen,et al.  Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data , 2016 .

[7]  Johannes Sorz,et al.  Humanities and social sciences in the bibliometric spotlight – Research output analysis at the University of Vienna and considerations for increasing visibility , 2016 .

[8]  Alexandre Arenas,et al.  Quantifying the diaspora of knowledge in the last century , 2016, Applied Network Science.

[9]  Ludo Waltman,et al.  A review of the literature on citation impact indicators , 2015, J. Informetrics.

[10]  Roland H. C. Yap,et al.  Investigations on Rating Computer Sciences Conferences: An Experiment with the Microsoft Academic Graph Dataset , 2016, WWW.

[11]  Juan Gorraiz,et al.  Availability of digital object identifiers (DOIs) in Web of Science and Scopus , 2016, J. Informetrics.

[12]  Peder Olesen Larsen,et al.  The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index , 2010, Scientometrics.

[13]  Enrique Orduña-Malea,et al.  Methods for estimating the size of Google Scholar , 2014, Scientometrics.

[14]  Satu Alakangas,et al.  Microsoft Academic: is the phoenix getting wings? , 2016, Scientometrics.

[15]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.

[16]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.

[17]  Henk F. Moed,et al.  A new methodology for comparing Google Scholar and Scopus , 2015, J. Informetrics.

[18]  Ольга Москалева Прибавление журналов в Web of Science Core Collection , 2017 .

[19]  Judit Bar-Ilan,et al.  Which h-index? — A comparison of WoS, Scopus and Google Scholar , 2008, Scientometrics.

[20]  Satu Alakangas,et al.  Microsoft Academic is one year old: the Phoenix is ready to leave the nest , 2017, Scientometrics.

[21]  MongeonPhilippe,et al.  The journal coverage of Web of Science and Scopus , 2016 .

[22]  Giulio Cimini,et al.  Model-based evaluation of scientific impact indicators , 2016, Physical review. E.

[23]  Andreas Thor,et al.  Convergent validity of bibliometric Google Scholar data in the field of chemistry - Citation counts for papers that were accepted by Angewandte Chemie International Edition or rejected but published elsewhere, using Google Scholar, Science Citation Index, Scopus, and Chemical Abstracts , 2009, J. Informetrics.

[24]  Carl T. Bergstrom,et al.  Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF) , 2016, ArXiv.

[25]  Martin P. Brändle,et al.  Microsoft Academic is on the verge of becoming a bibliometric superpower , 2017 .

[26]  Vincent Larivière,et al.  The linguistic patterns and rhetorical structure of citation context: an approach using n-grams , 2016, Scientometrics.

[27]  Dana L. Roth,et al.  Chemical information for chemists : a primer , 2014 .

[28]  Jevin D. West,et al.  Leveraging Citation Networks to Visualize Scholarly Influence Over Time , 2016, Front. Res. Metr. Anal..

[29]  Anne-Wil Harzing,et al.  Microsoft Academic (Search): a Phoenix arisen from the ashes? , 2016, Scientometrics.

[30]  Marc Bertin,et al.  Categorizations and Annotations of Citation in Research Evaluation , 2008, FLAIRS.

[31]  Henk F. Moed,et al.  Citation Analysis in Research Evaluation , 1899 .

[32]  Santanu Chaudhury,et al.  Ranking academic institutions on potential paper acceptance in upcoming conferences , 2016, ArXiv.

[33]  Shuai Ma,et al.  Ensemble Enabled Weighted PageRank , 2016, ArXiv.

[34]  Jevin D. West,et al.  Visualizing Scholarly Publications and Citations to Enhance Author Profiles , 2017, WWW.

[35]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[36]  Jody Condit Fagan,et al.  An evidence-based review of academic web search engines, 2014-2016: Implications for librarians’ practice and research agenda , 2017 .

[37]  Jeroen Bosman,et al.  Scopus reviewed and compared: the coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar , 2006 .

[38]  Petr Knoth,et al.  An Analysis of the Microsoft Academic Graph , 2016, D Lib Mag..

[39]  P. Saint Raymond,et al.  Regulations , 1994, Intertax.

[40]  Xinbing Wang,et al.  AceMap: A Novel Approach towards Displaying Relationship among Academic Literatures , 2016, WWW.

[41]  Martin P. Brändle,et al.  Citation analysis with microsoft academic , 2016, Scientometrics.

[42]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[43]  Massimo Franceschet,et al.  A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar , 2010, Scientometrics.

[44]  Péter Jacsó,et al.  Academic Search Engines: A Quantitative Outlook , 2015, Online Inf. Rev..

[45]  Yizhou Sun,et al.  WSDM Cup 2016: Entity Ranking Challenge , 2016, WSDM '16.

[46]  Vlad Sandulescu,et al.  Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016 , 2016, ArXiv.

[47]  Petr Knoth,et al.  Semantometrics: Towards fulltext-based research evaluation , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[48]  Matús Medo,et al.  Quantifying and suppressing ranking bias in a large citation network , 2017, J. Informetrics.

[49]  Mike Thelwall,et al.  Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research , 2016, Scientometrics.