Preliminary investigation on quantitative evaluation method of scientific papers based on text analysis

Recently, there are many resources of scientific research on the Web. Because the gap between the amount of academic information available on the Web and human processing abilities becomes large, several problems have arisen: (1) losing opportunities of research presentation, (2) loosing opportunities of gathering research information, (3) increasing burden of peer review, (4) difficulty in selecting papers to read. In order to solve these problems, quantitative evaluation index of a paper as a selection criterion is needed. This paper proposes quantitative evaluation methods of scientific papers on the basis of text analysis. The journal similarity of a target journal to an authoritative journal is defined with using distributed representations of papers. When the similarity of a target journal is high, its quality in terms of writing and organization is expected to be high. This paper also proposes an evaluation method using ROUGE (Recall-Oriented Understudy for Gisting Evaluation). Proposed evaluation methods are evaluated by experiments. Experiments results show that the journal similarity has rough correspondence to Scimago Journal Rank (SJR). The result also implies the possibility of evaluating journals that have not yet been indexed in some authoritative journal indices using the proposed methods. The evaluation method using ROUGE is shown to have the possibility of evaluating the consistency of papers.

[1]  Quoc V. Le,et al.  Document Embedding with Paragraph Vectors , 2015, ArXiv.

[2]  Ewen Callaway,et al.  Beat it, impact factor! Publishing elite turns against controversial metric , 2016, Nature.

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[5]  Ted Bergstrom Papers The Eigenfactor Metrics: A network approach to assessing scholarly journals , 2010 .

[6]  Rasim M. Alguliyev,et al.  Modified Impact Factors , 2017, J. Sci. Res..

[7]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Henk F. Moed,et al.  Measuring contextual citation impact of scientific journals , 2009, J. Informetrics.

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  E. Garfield The history and meaning of the journal impact factor. , 2006, JAMA.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Vicente P. Guerrero-Bote,et al.  A further step forward in measuring journals' scientific prestige: The SJR2 indicator , 2012, J. Informetrics.

[15]  L. Trinquart,et al.  The Global Burden of Journal Peer Review in the Biomedical Literature: Strong Imbalance in the Collective Enterprise , 2016, PloS one.

[16]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[17]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Carl T. Bergstrom,et al.  The Eigenfactor MetricsTM: A Network Approach to Assessing Scholarly Journals , 2010, Coll. Res. Libr..

[20]  Loet Leydesdorff,et al.  A review of theory and practice in scientometrics , 2015, Eur. J. Oper. Res..