The Poverty of Citation Databases: Data Mining is Crucial for Fair Metrical Evaluation of Research Performance

For a long time, the journal impact factor has been used to evaluate the scientific performance of authors. It is increasingly recognized, however, that judging an author’s scientific performance should take into account that author’s scientific output, and not the output of other authors publishing in the same journal—the citation rates of papers in one journal can vary enormously, and the journal impact factor fails to that consider that variance. The number of citations an author attracts is a reliable measure of the attention the author receives from the scientific community, or, in other words, of the scientific impact of an author. (Attention is a lame arbiter of scientific quality, but that is a problem that cannot be solved by any simple metrics.) In 2005, Jorge E. Hirsch proposed a simple, elegant measure of an author’s impact: the h index, which is the number of an author’s papers (h) with at least h citations. Other author-based indexes have been proposed, such as the g index, which, given a set of papers ranked in decreasing order of the number of citations received, is the largest number such that the top g articles received together at least g 2 citations (Egghe 2006). The g index better takes into account the citation scores of top articles.