A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics

One of the most significant inaccuracies of bibliometric databases is that of omitted citations, namely, missing electronic links between a paper of interest and some citing papers, which are (or should be) covered by the database. This paper proposes a novel approach for estimating a database’s omitted-citation rate, based on the combined use of 2 or more bibliometric databases. A statistical model is also presented for (a) estimating the “true” number of citations received by individual papers or sets of papers, and (b) defining an appropriate confidence interval. The proposed approach could represent a first step towards the definition of a standard for evaluating the accuracy level of databases.

[1]  P. Bievre The 2007 International Vocabulary of Metrology (VIM), JCGM 200:2008 [ISO/IEC Guide 99]: Meeting the need for intercontinentally understood concepts and their associated intercontinentally agreed terms. , 2009 .

[2]  Lars Iselid,et al.  Web of Science and Scopus: a journal title overlap study , 2008, Online Inf. Rev..

[3]  Éric Archambault,et al.  Comparing bibliometric statistics obtained from the Web of Science and Scopus , 2009 .

[4]  F. W. Lancaster,et al.  Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for , 2008 .

[5]  Catherine Sherrington,et al.  CENTRAL, PEDro, PubMed, and EMBASE Are the Most Comprehensive Databases Indexing Randomized Controlled Trials of Physical Therapy Interventions , 2011, Physical Therapy.

[6]  Judit Bar-Ilan,et al.  Which h-index? — A comparison of WoS, Scopus and Google Scholar , 2008, Scientometrics.

[7]  David Adam,et al.  Citation analysis: The counting house , 2002, Nature.

[8]  Doheon Lee,et al.  A Taxonomy of Dirty Data , 2004, Data Mining and Knowledge Discovery.

[9]  Fred Spiring,et al.  Introduction to Statistical Quality Control , 2007, Technometrics.

[10]  Péter Jacsó,et al.  The future of citation indexing: An interview with Eugene Garfield , 2004 .

[11]  Elizabeth S. Vieira,et al.  A comparison of Scopus and Web of Science for a typical university , 2009, Scientometrics.

[12]  Péter Jacsó Grim tales about the impact factor and the h-index in the Web of Science and the Journal Citation Reports databases: reflections on Vanclay’s criticism , 2012, Scientometrics.

[13]  Lei Wang,et al.  Three options for citation tracking: Google Scholar, Scopus and Web of Science , 2006, Biomedical digital libraries.

[14]  Fiorenzo Franceschini,et al.  Analysis of the Hirsch index's operational properties , 2010, Eur. J. Oper. Res..

[15]  Fiorenzo Franceschini,et al.  The success-index: an alternative approach to the h-index for evaluating an individual’s research output , 2012, Scientometrics.

[16]  Miguel A. García-Pérez,et al.  Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology , 2010, J. Assoc. Inf. Sci. Technol..

[17]  James H. Sweetland,et al.  Errors in Bibliographic Citations: A Continuing Problem , 1989, The Library Quarterly.

[18]  Fiorenzo Franceschini,et al.  The effect of database dirty data on h-index calculation , 2012, Scientometrics.

[19]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[20]  C. Cleverdon Citation Indexing , 1965, Nature.

[21]  Robert A. Buchanan,et al.  Accuracy of Cited References: The Role of Citation Databases , 2006 .

[22]  yderis p,et al.  ISI Web of Knowledge. , 2013 .

[23]  Jie Li,et al.  Citation Analysis: Comparison of Web of Science®, Scopus™, SciFinder®, and Google Scholar , 2010 .

[24]  Fiorenzo Franceschini,et al.  Influence of database mistakes on journal citation analysis: remarks on the paper by Franceschini and Maisano, QREI (2010) , 2011, Qual. Reliab. Eng. Int..

[25]  Fiorenzo Franceschini,et al.  An informetric model for the success-index , 2013, J. Informetrics.

[26]  Péter Jacsó,et al.  Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster , 2008, Libr. Trends.

[27]  Péter Jacsó,et al.  Errors of omission and their implications for computing scientometric measures in evaluating the publishing productivity and impact of countries , 2009, Online Inf. Rev..

[28]  Helmut A. Abt WHAT FRACTION OF LITERATURE REFERENCES ARE INCORRECT , 1992 .