Test data reuse for evaluation of adaptive machine learning algorithms: over-fitting to a fixed 'test' dataset and a potential solution
暂无分享,去创建一个
[1] Toniann Pitassi,et al. Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.
[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[3] Aaron Roth,et al. Adaptive Learning with Robust Generalization Guarantees , 2016, COLT.
[4] Kyle J Myers,et al. Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. , 2012, Academic radiology.
[5] Peter F. Neher,et al. Tractography-based connectomes are dominated by false-positive connections , 2016, bioRxiv.
[6] Gaël Varoquaux,et al. Cross-validation failure: Small sample sizes lead to large error bars , 2017, NeuroImage.
[7] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[8] Avrim Blum,et al. The Ladder: A Reliable Leaderboard for Machine Learning Competitions , 2015, ICML.
[9] James Zou,et al. Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.
[10] Raef Bassily,et al. Algorithmic stability for adaptive data analysis , 2015, STOC.
[11] Huey-miin Hsueh,et al. Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing , 2003, Journal of biopharmaceutical statistics.
[12] Juha Reunanen,et al. Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..
[13] Toniann Pitassi,et al. The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.
[14] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .
[15] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[16] Andrew Gelman,et al. Data-dependent analysis—a "garden of forking paths"— explains why many statistically significant comparisons don't hold up. , 2014 .
[17] Max Kuhn,et al. Applied Predictive Modeling , 2013 .
[18] J. Brooks. Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .
[19] Gavin C. Cawley,et al. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..
[20] Howard Bowman,et al. I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification , 2016, bioRxiv.
[21] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[22] H. Pashler,et al. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.
[23] Toniann Pitassi,et al. Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.
[24] J. Ioannidis,et al. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.
[25] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..
[26] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .
[27] Adam D. Smith,et al. Information, Privacy and Stability in Adaptive Data Analysis , 2017, ArXiv.
[28] Hans Knutsson,et al. Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.
[29] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[30] Glenn Fung,et al. On the Dangers of Cross-Validation. An Experimental Evaluation , 2008, SDM.
[31] R. F. Wagner,et al. Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers. , 1999, Medical physics.
[32] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.
[33] W. K. Simmons,et al. Circular analysis in systems neuroscience: the dangers of double dipping , 2009, Nature Neuroscience.
[34] Toniann Pitassi,et al. Guilt-free data reuse , 2017, Commun. ACM.
[35] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.
[36] Thomas Steinke,et al. Generalization for Adaptively-chosen Estimators via Stable Median , 2017, COLT.
[37] Max Kuhn,et al. Building Predictive Models in R Using the caret Package , 2008 .