暂无分享,去创建一个
In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for accepted papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of ‘top-tier conference publications’ when assessing the quality of individual researchers.
[1] C. Neylon,et al. Article-Level Metrics and the Evolution of Scientific Impact , 2009, PLoS biology.
[2] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[3] R S MacKay,et al. Calibration with confidence: a principled method for panel assessment , 2015, Royal Society Open Science.
[4] M. Welling,et al. 2 A Bayesian Reformulation of the Platt-Burges Model , 2015 .