论文信息 - How to Host An Effective Data Competition: Statistical Advice for Competition Design and Analysis - 字舞流文

How to Host An Effective Data Competition: Statistical Advice for Competition Design and Analysis

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.

Lu Lu | Christine M. Anderson-Cook | Michael L. Fugate | Kary L. Myers | Kary L. Myers | Kevin R. Quinlan | Norma Pawley | C. Anderson‐Cook | M. Fugate | N. Pawley | Lu Lu | K. Quinlan

[1] J. Neter,et al. Applied Linear Regression Models , 1983 .

[2] R. H. Myers. Generalized Linear Models: With Applications in Engineering and the Sciences , 2001 .

[3] Douglas C. Montgomery,et al. Response Surface Methodology: Process and Product Optimization Using Designed Experiments , 1995 .

[4] Christine M. Anderson-Cook,et al. The weighted priors approach for combining expert opinions in logistic regression experiments , 2017 .

[5] X. Hu. Generalized Linear Models , 2003 .

[6] Nathaniel T. Stevens,et al. Quantifying similarity in reliability surfaces using the probability of agreement , 2017 .

[7] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8] P. McCullagh,et al. Generalized Linear Models , 1984 .

[9] G. Derringer,et al. Simultaneous Optimization of Several Response Variables , 1980 .

[10] Avrim Blum,et al. The Ladder: A Reliable Leaderboard for Machine Learning Competitions , 2015, ICML.

[11] Scott J. Richter,et al. A Method for Determining Equivalence in Industrial Applications , 2002 .

[12] Connie M. Borror,et al. The difference between “equivalent” and “not different” , 2016 .

[13] Christine M. Anderson-Cook,et al. Optimization of Designed Experiments Based on Multiple Criteria Utilizing a Pareto Frontier , 2011, Technometrics.

[14] Thomas M. Loughin,et al. Analysis of Categorical Data with R , 2014 .

[15] Christine M. Anderson-Cook,et al. Improved learning from data competitions through strategic design of training and test data sets , 2019, Quality Engineering.

[16] S. Wellek. Testing Statistical Hypotheses of Equivalence and Noninferiority , 2010 .

[17] James R. Simpson,et al. Guidelines for Planning and Evidence for Assessing a Well-Designed Experiment , 2013 .

[18] Christine M. Anderson-Cook,et al. Comparing the Reliability of Related Populations With the Probability of Agreement , 2017, Technometrics.

[19] Christine M. Anderson-Cook,et al. Bayesian design of experiments for logistic regression to evaluate multiple nuclear forensic algorithms , 2018, Applied Stochastic Models in Business and Industry.

[20] A. Agresti. An introduction to categorical data analysis , 1997 .

[21] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[22] Khidir M. Abdelbasit,et al. Experimental Design for Binary Data , 1983 .

[23] G. Geoffrey Vining,et al. A Tutorial on the Planning of Experiments , 2013 .

[24] J. Szarka. Equivalence and Noninferiority Tests for Quality, Manufacturing and Test Engineers , 2014 .

[25] Margaret J. Robertson,et al. Design and Analysis of Experiments , 2006, Handbook of statistics.

[26] Douglas C. Montgomery,et al. Generalized Linear Models: With Applications in Engineering and the Sciences: Second Edition , 2012 .