Assessing the Quality of Scientific Papers

A multitude of factors are responsible for the overall quality of scientific papers, including readability, linguistic quality, fluency,semantic complexity, and of course domain-specific technical factors. These factors vary from one field of study to another. In this paper, we propose a measure and method for assessing the overall quality of the scientific papers in a particular field of study. We evaluate our method in the computer science domain, but it can be applied to other technical and scientific fields.Our method is based on the corpus linguistics technique. This technique enables the extraction of required information and knowledge associated with a specific domain. For this purpose, we have created a large corpus, consisting of papers from very high impact conferences. First, we analyze this corpus in order to extract rich domain-specific terminology and knowledge. Then we use the acquired knowledge to estimate the quality of scientific papers by applying our proposed measure. We examine our measure on high and low scientific impact test corpora. Our results show a significant difference in the measure scores of the high and low impact test corpora. Second, we develop a classifier based on our proposed measure and compare it to the baseline classifier. Our results show that the classifier based on our measure over-performed the baseline classifier. Based on the presented results the proposed measure and the technique can be used for automated assessment of scientific papers.

[1]  Ji-Yeon Chang The use of general and specialized corpora as reference sources for academic English writing: A case study , 2014 .

[2]  George R. S. Weir,et al.  Average collocation frequency as an indicator of semantic complexity , 2007 .

[3]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[4]  Ani Nenkova,et al.  What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain , 2013, TACL.

[5]  Andrew Wilson,et al.  Corpus linguistics : an introduction. , 2001 .

[6]  Zhao-Ming Gao,et al.  Automatic Extraction of English Collocations and their Chinese-English Bilingual Examples: A Computational Tool for Bilingual Lexicography , 2014 .

[7]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[8]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[9]  R. Flesch The Art of Readable Writing , 1974 .

[10]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[11]  Sara Dolnicar,et al.  The readability of articles in tourism journals , 2015 .

[12]  George R S Weir,et al.  Optimising content clarity for human-machine systems , 2007, IFAC HMS.

[13]  Ann Grafstein,et al.  Towards a Theory of Readability , 2016 .

[14]  Dana Waskita Corpus Linguistics: Method, Theory, and Practice , 2017 .

[15]  Ali Gazni,et al.  Are the abstracts of high impact articles more readable? Investigating the evidence from top research institutions in the world , 2011, J. Inf. Sci..

[16]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[17]  Ani Nenkova,et al.  A corpus of science journalism for analyzing writing quality , 2013, Dialogue Discourse.

[18]  George R. Klare,et al.  The measurement of readability , 1963 .

[19]  George R. S. Weir,et al.  From corpus-based collocation frequencies to readability measure , 2006 .

[20]  R. Gunning The Technique of Clear Writing. , 1968 .

[21]  Nitesh V. Chawla,et al.  Will This Paper Increase Your h-index?: Scientific Impact Prediction , 2014, WSDM.