On the meaningful and non-meaningful use of reference sets in bibliometrics

In a paper published recently, Kaur, Radicchi, and Menczer (2013) used data from the Scholarometer http://scholarometer.indiana.edu/) to examine the effectiveness of various metrics, such as the h index (Hirsch, 2005) and he new crown indicator (Lundberg, 2007; Opthof & Leydesdorff, 2010; Waltman, van Eck, van Leeuwen, Visser, & van Raan, 011a; Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011b) in generating field-normalized citation scores. While he subject of field-normalization in bibliometrics has up to now been discussed primarily at the paper level (Bornmann Leydesdorff, 2013), Kaur et al. (2013) have been looking at the author level: which metric allows an effective method of eld-normalization with which scientists can be compared across different fields? There have already been numerous calls or benchmarks for comparative assessment in order to evaluate individual scientists (Garfield, 1979; Kreiman & Maunsell, 011). Something that is very difficult to implement for individuals (Coleman, Bolumole, & Frankel, 2012; El Emam, Arbuckle, onker, & Anderson, 2012) can be achieved at the institutional level with the data from the rankings, as Bornmann and de oya Anegón (2014) have shown in their study (provided that one accepts the reasoned decisions which are made for onstructing the rankings). We would like to use the paper by Kaur et al. (2013) as a starting point for an investigation into an appropriate method f field-normalization (at individual scientist level). In bibliometrics, in order to allow cross-disciplinary comparisons of itation impact at the level of individual papers, a reference set made up of all the papers from the same field (and the same ublication year) is compiled for each paper. One can expect that the Web of Science (WoS, Thomson Reuters) and Scopus Elsevier) have a good coverage of the literature in the natural and life sciences (Mahdi, d’Este, & Neely, 2008). Only by taking ll the comparable papers into account is it possible to ensure that the measurement of the impact of the paper in question s valid compared to similar papers (Bornmann & Marx, 2013a). If only some of the total numbers of papers are used in the eference sets, there is no comparison with the relevant reference sets. In their study, Kaur et al. (2013) use bibliometric data for scientists who have used the Scholarometer tool to normalize etrics at the level of individual scientists. As we can assume that not all scientists worldwide use this tool (nor a sample hich can be interpreted as meaningful, such as all scientists with at least one paper in WoS), it is not possible to generate eaningful and valid reference data at the level of scientists on this basis. In order to test the various metrics in their study, he authors would have had to normalize each scientist recorded in Scholarometer with an appropriate valid reference set including all scientists). Only then would they have been able to verify the effectiveness of the metrics using the example f scientists from Scholarometer. Where the new crown indicator is concerned, which Kaur et al. (2013) included in their comparative study along with thers, there is an additional problem in that it was obviously also calculated with Scholarometer data. The expected citation ate for a publication is therefore not – as is a currently standard in bibliometrics – calculated over the impact of all the ublications in a subject category of the Web of Science (WoS, Thomson Reuters) or in a Scopus (Elsevier) subject area and publication year, but over the impact of a selection of publications which users of the Scholarometer have entered quite andomly and assigned to certain disciplines. Can we call this a valid reference set? We would say not. Bibliometric data used to evaluate research on the level of individual scientists is highly critical data and should therefore e compiled very carefully (Bornmann & Marx, 2013b; Marx & Bornmann, 2014). In the Scholarometer, users can enter names f scientists as they wish, assign these scientists to certain disciplines, and compile their publication sets. Even though there re processes implemented in the Scholarometer which are supposed to prevent serious misclassifications (Kaur et al., 2012), ne can assume that these assignments are not high-quality. However, we need high-quality data when we put reference ets together and use them for the evaluation of research. Indeed, the database operator Chemical Abstracts Services, for xample, employs a number of highly specialised people to assign individual publications in chemistry and its related fields o specific subject categories (Bornmann, Marx, & Barth, 2013; Bornmann, Mutz, Marx, Schier, & Daniel, 2011). Thomson euters (WoS) and Elsevier (Scopus) have addressed the problem of subject matter classification by assigning journals (and ot individual papers) to subject categories. Despite much criticism (Bornmann, Mutz, Neuhaus, & Daniel, 2008b; Rafols & eydesdorff, 2009), this classification is currently used as a standard in bibliometrics.

[1]  Luk Arbuckle,et al.  Two h-Index Benchmarks for Evaluating the Publication Performance of Medical Informatics Researchers , 2012, Journal of medical Internet research.

[2]  Lutz Bornmann,et al.  A multilevel modelling approach to investigating the predictive validity of editorial decisions: do the editors of a high profile journal select manuscripts that are highly cited after publication? , 2011 .

[3]  Lutz Bornmann,et al.  How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations , 2013, Scientometrics.

[4]  Jonas Lundberg,et al.  Lifting the crown - citation z-score , 2007, J. Informetrics.

[5]  Péter Jacsó,et al.  Google Scholar's Ghost Authors. , 2009 .

[6]  Gabriel Kreiman,et al.  Nine Criteria for a Measure of Scientific Output , 2011, Front. Comput. Neurosci..

[7]  B. Jay Coleman,et al.  Benchmarking Individual Publication Productivity in Logistics , 2012 .

[8]  A. Neely,et al.  Citation Counts: Are They Good Predictors of Rae Scores? A Bibliometric Analysis of RAE 2001 , 2008 .

[9]  Filippo Menczer,et al.  Universality of scholarly impact metrics , 2013, J. Informetrics.

[10]  Lutz Bornmann,et al.  On the problems of dealing with bibliometric data , 2014, J. Assoc. Inf. Sci. Technol..

[11]  Lutz Bornmann,et al.  Do we need the h index and its variants in addition to standard bibliometric measuresq , 2009 .

[12]  Andrea Bergmann,et al.  Citation Indexing Its Theory And Application In Science Technology And Humanities , 2016 .

[13]  Lutz Bornmann,et al.  The normalization of citation counts based on classification systems , 2013, Publ..

[14]  Loet Leydesdorff,et al.  Caveats for the journal and field normalizations in the CWTS ("Leiden") evaluations of research performance , 2010, J. Informetrics.

[15]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..

[16]  Lutz Bornmann,et al.  What proportion of excellent papers makes an institution one of the best worldwide? Specifying thresholds for the interpretation of the results of the SCImago Institutions Ranking and the Leiden Ranking , 2014, J. Assoc. Inf. Sci. Technol..

[17]  Ismael Rafols,et al.  Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects , 2008, J. Assoc. Inf. Sci. Technol..

[18]  Thed N. van Leeuwen,et al.  Towards a new crown indicator: an empirical analysis , 2010, Scientometrics.

[19]  Péter Jacsó,et al.  Metadata mega mess in Google Scholar , 2010, Online Inf. Rev..

[20]  Andreas Thor,et al.  Convergent validity of bibliometric Google Scholar data in the field of chemistry - Citation counts for papers that were accepted by Angewandte Chemie International Edition or rejected but published elsewhere, using Google Scholar, Science Citation Index, Scopus, and Chemical Abstracts , 2009, J. Informetrics.

[21]  F. Menczer,et al.  Scholarometer: A Social Framework for Analyzing Impact across Disciplines , 2012, PloS one.

[22]  Loet Leydesdorff,et al.  The validation of (advanced) bibliometric indicators through peer assessments: A comparative study using data from InCites and F1000 , 2012, J. Informetrics.

[23]  Lutz Bornmann,et al.  Do we need the h index and its variants in addition to standard bibliometric measures? , 2009, J. Assoc. Inf. Sci. Technol..

[24]  Lutz Bornmann,et al.  OPEN PEN ACCESS CCESS , 2008 .

[25]  Lutz Bornmann,et al.  Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine , 2008, J. Assoc. Inf. Sci. Technol..

[26]  Miguel A. García-Pérez,et al.  Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Lutz Bornmann,et al.  Are there really two types of h index variants? A validation study by using molecular life sciences data , 2009 .

[28]  L. Bornmann,et al.  How good is research really? , 2013, EMBO reports.

[29]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[30]  Thed N. van Leeuwen,et al.  Towards a new crown indicator: Some theoretical considerations , 2010, J. Informetrics.