Mining typical features for highly cited papers

In this paper, we discuss the application of the data mining tools to identify typical features for highly cited papers (HCPs). By integrating papers’ external features and quality features, the feature space used to model HCPs was established. Then, a series of predictor teams were extracted from the feature space with rough set reduction framework. Each predictor team was used to construct a base classifier. Then the five base classifiers with the highest classification performance and larger diversity on whole were selected to construct a multi-classifier system (MCS) for HCPs. The combination prediction model obtained better performance than models of a single predictor team. 11 typical prediction features for HCPs were extracted on the basis of the MCS. The findings show that both the papers’ inner quality and external features, mainly represented as the reputation of the authors and journals, contribute to generation of HCPs in future.

[1]  R. Vohra,et al.  Finding the most vital arcs in a network , 1989 .

[2]  G. Gilbert Referencing as Persuasion , 1977 .

[3]  R. Jackson,et al.  The Matthew Effect in Science , 1988, International journal of dermatology.

[4]  K. Henkens,et al.  Signals in Science - on the Importance of Signaling in Gaining Attention in Science , 2004 .

[5]  D. Cases,et al.  How can we investigate citation behavior?: a study of reasons for citing literature in communication , 2000 .

[6]  Weiguo Cao,et al.  Research on The Identification Method of Key Nodes in Supply Chain Information Networks , 2009 .

[7]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[8]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[9]  Donald Owen Case,et al.  How can we investigate citation behavior? A study of reasons for citing literature in communication , 2000, J. Am. Soc. Inf. Sci..

[10]  Rong Tang,et al.  Author-rated importance of cited references in biology and psychology publications , 2008, J. Documentation.

[11]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[12]  Hendrik P. van Dalen,et al.  What makes a scientific article influential? The case of demographers , 2001, Scientometrics.

[13]  Jakub Wróblewski,et al.  Genetic Algorithms in Decomposition and Classification Problems , 1998 .

[14]  David N. Laband,et al.  Favoritism versus Search for Good Papers: Empirical Evidence Regarding the Behavior of Journal Editors , 1994, Journal of Political Economy.

[15]  Wolfgang Glänzel,et al.  Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon , 2004, Scientometrics.

[16]  R. Bordley A Multiplicative Formula for Aggregating Probability Assessments , 1982 .

[17]  Hsinchun Chen,et al.  Criminal network analysis and visualization , 2005, CACM.

[18]  Kapseon Kim,et al.  The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references , 2004, Scientometrics.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Theresa Lillis,et al.  Who's citing whose writings? A corpus based study of citations as interpersonal resource in English medium national and English medium international journals , 2010 .

[21]  Lawrence D. Fu,et al.  Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature , 2010, Scientometrics.

[22]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[23]  Daren Yu,et al.  Short-Term Solar Flare Prediction Using Predictor Teams , 2010 .

[24]  Kim Kapseon The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references , 2004 .

[25]  Christopher C. Yang,et al.  Analysis of terrorist social networks with fractal views , 2009, J. Inf. Sci..

[26]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[27]  J. Koricheva,et al.  What determines the citation frequency of ecological papers? , 2005, Trends in ecology & evolution.