暂无分享,去创建一个
[1] Suchi Saria,et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist , 2020, Nature Medicine.
[2] Kaylie A. Carbine,et al. Sample size calculations in human electrophysiology (EEG and ERP) studies: A systematic review and recommendations for increased rigor. , 2017, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.
[3] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[4] Michael Gao,et al. A Path for Translation of Machine Learning Products into Healthcare Delivery , 2020, EMJ Innovations.
[5] Gael Varoquaux,et al. Establishment of Best Practices for Evidence for Prediction: A Review. , 2019, JAMA psychiatry.
[6] Alejandro F. Frangi,et al. Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions , 2018, ArXiv.
[7] P. Anderberg,et al. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review , 2017, PloS one.
[8] Jake VanderPlas,et al. A Practical Taxonomy of Reproducibility for Machine Learning Research , 2018 .
[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[10] R. D'Agostino,et al. Non‐inferiority trials: design concepts and issues – the encounters of academic consultants in statistics , 2002, Statistics in medicine.
[11] O. Colliot,et al. Predicting the Progression of Mild Cognitive Impairment Using Machine Learning: A Systematic, Quantitative and Critical Review , 2020, medRxiv.
[12] Harini Suresh,et al. A Framework for Understanding Unintended Consequences of Machine Learning , 2019, ArXiv.
[13] R. Hofmann-Wellenhof,et al. Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. , 2019, JAMA dermatology.
[14] Michela Paganini,et al. The Scientific Method in the Science of Machine Learning , 2019, ArXiv.
[15] Senén Barro,et al. Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..
[16] Gary Marcus,et al. Deep Learning: A Critical Appraisal , 2018, ArXiv.
[17] Dietmar Jannach,et al. Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.
[18] X Yu,et al. Classify epithelium‐stroma in histopathological images based on deep transferable network , 2018, Journal of microscopy.
[19] R. Rosenthal. The file drawer problem and tolerance for null results , 1979 .
[20] J. Ioannidis. Why Most Published Research Findings Are False , 2005, PLoS medicine.
[21] Mohak Shah,et al. Performance Evaluation in Machine Learning , 2015 .
[22] Colin Raffel,et al. Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.
[23] Zachary C. Lipton,et al. Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.
[24] Tolga Tasdizen,et al. Improving the robustness of convolutional networks to appearance variability in biomedical images , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).
[25] Odd Erik Gundersen,et al. State of the Art: Reproducibility in Artificial Intelligence , 2018, AAAI.
[26] Oliver Zendel,et al. How Good Is My Test Data? Introducing Safety Analysis for Computer Vision , 2017, International Journal of Computer Vision.
[27] Vince D. Calhoun,et al. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls , 2017, NeuroImage.
[28] Moritz Hardt,et al. A Meta-Analysis of Overfitting in Machine Learning , 2019, NeurIPS.
[29] Benjamin Recht,et al. Evaluating Machine Accuracy on ImageNet , 2020, ICML.
[30] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[31] Bokai WANG,et al. Comparisons of Superiority, Non-inferiority, and Equivalence Trials , 2017, Shanghai archives of psychiatry.
[32] D. Sculley,et al. Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.
[33] C. Jack,et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI) , 2005, Alzheimer's & Dementia.
[34] Christian Wachinger,et al. Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12, 207 Individuals , 2018, ArXiv.
[35] Erik Christensen,et al. Methodology of superiority vs. equivalence trials and non-inferiority trials. , 2007, Journal of hepatology.
[36] Ser-Nam Lim,et al. A Metric Learning Reality Check , 2020, ECCV.
[37] Frank E. Harrell,et al. Prediction models need appropriate internal, internal-external, and external validation. , 2016, Journal of clinical epidemiology.
[38] Andrew Doyle,et al. A Survey of Crowdsourcing in Medical Image Analysis , 2019, Hum. Comput..
[39] Samaneh Abbasi-Sureshjani,et al. Risk of Training Diagnostic Algorithms on Data with Demographic Bias , 2020, iMIMIC/MIL3iD/LABELS@MICCAI.
[40] Kei Yamada,et al. Machine learning studies on major brain diseases: 5-year trends of 2014–2018 , 2018, Japanese Journal of Radiology.
[41] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[42] Taghi M. Khoshgoftaar,et al. Sample size determination for biomedical big data with limited labels , 2020, Network Modeling Analysis in Health Informatics and Bioinformatics.
[43] Philipp Kellmeyer,et al. Ethical and Legal Implications of the Methodological Crisis in Neuroimaging , 2017, Cambridge Quarterly of Healthcare Ethics.
[44] Carl Gutwin,et al. Threats of a replication crisis in empirical computer science , 2020, Commun. ACM.
[45] Anton van den Hengel,et al. On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law , 2020, NeurIPS.
[46] Ronald M. Summers,et al. A Review of Deep Learning in Medical Imaging: Imaging Traits, Technology Trends, Case Studies With Progress Highlights, and Future Promises , 2020, Proceedings of the IEEE.
[47] Kiri Wagstaff,et al. Machine Learning that Matters , 2012, ICML.
[48] John P. A. Ioannidis,et al. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies (1990-2012) and of latest practices (2017-2018) in high-impact journals , 2019, NeuroImage.
[49] S. Park,et al. Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction. , 2018, Radiology.
[50] Luca Foschini,et al. Reproducibility in Machine Learning for Health , 2019, RML@ICLR.
[51] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[52] Eric J Topol,et al. High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.
[53] Torsten Rohlfing,et al. Image Similarity and Tissue Overlaps as Surrogates for Image Registration Accuracy: Widely Used but Unreliable , 2012, IEEE Transactions on Medical Imaging.
[54] Agatha Lenartowicz,et al. Classification Accuracy of Neuroimaging Biomarkers in Attention-Deficit/Hyperactivity Disorder: Effects of Sample Size and Circular Analysis. , 2019, Biological psychiatry. Cognitive neuroscience and neuroimaging.
[55] Rodrigo C. Barros,et al. Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification , 2019, TIA@MICCAI.
[56] Ali Sunyaev,et al. What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images , 2021, HICSS.
[57] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.
[58] Leo Celi,et al. Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data , 2020, ArXiv.
[59] J. Popp,et al. Sample size planning for classification models. , 2012, Analytica chimica acta.
[60] L.. HARKing: Hypothesizing After the Results are Known , 2002 .
[61] Howard Bowman,et al. I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data , 2020, Neuroscience and Biobehavioral Reviews.
[62] Gustavo Carneiro,et al. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.
[63] Daniel Berrar,et al. Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers , 2017, Machine Learning.
[64] Ali Borji,et al. Negative results in computer vision: A perspective , 2017, Image Vis. Comput..
[65] Tal Arbel,et al. Accounting for Variance in Machine Learning Benchmarks , 2021, MLSys.
[66] Arturo Casadevall,et al. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research , 2015, Proceedings of the National Academy of Sciences.
[67] Gerd Gigerenzer,et al. Statistical Rituals: The Replication Delusion and How We Got There , 2018, Advances in Methods and Practices in Psychological Science.
[68] Tapio Salakoski,et al. A comparison of AUC estimators in small-sample studies , 2009, MLSB.
[69] M. Lungren,et al. Preparing Medical Imaging Data for Machine Learning. , 2020, Radiology.
[70] Gaël Varoquaux,et al. Cross-validation failure: Small sample sizes lead to large error bars , 2017, NeuroImage.
[71] Marcus A. Badgeley,et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.
[72] David Uminsky,et al. Reliance on metrics is a fundamental challenge for AI , 2020, Patterns.
[73] Francesca Mangili,et al. Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , 2015, J. Mach. Learn. Res..
[74] Pascal Vincent,et al. Unreproducible Research is Reproducible , 2019, ICML.
[75] Raghavendra Selvan,et al. Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models , 2020, ArXiv.
[76] Byoung Wook Choi,et al. How to Develop, Validate, and Compare Clinical Prediction Models Involving Radiological Parameters: Study Design and Statistical Methods , 2016, Korean journal of radiology.
[77] Stephan Günnemann,et al. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.
[78] J. Schumi,et al. TRIALS REVIEW Open Access , 2022 .
[79] Peter Henderson,et al. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning , 2020, ArXiv.
[80] Heikki Huttunen,et al. HARK Side of Deep Learning - From Grad Student Descent to Automated Machine Learning , 2019, ArXiv.
[81] Diego H. Milone,et al. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis , 2020, Proceedings of the National Academy of Sciences.
[82] Iñaki Inza,et al. Dealing with the evaluation of supervised classification algorithms , 2015, Artificial Intelligence Review.
[83] Shehroz S. Khan,et al. Learning to Unlearn: Building Immunity to Dataset Bias in Medical Imaging Studies , 2018, ArXiv.
[84] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.
[85] Phillip M Cheng,et al. Artificial Intelligence for Medical Image Analysis: A Guide for Authors and Reviewers. , 2019, AJR. American journal of roentgenology.
[86] Ninon Burgos,et al. Convolutional Neural Networks for Classification of Alzheimer's Disease: Overview and Reproducible Evaluation , 2019, Medical Image Anal..
[87] Kellyn F Arnold,et al. Time to reality check the promises of machine learning-powered precision medicine , 2020, The Lancet. Digital health.
[88] Bram van Ginneken,et al. A survey on deep learning in medical image analysis , 2017, Medical Image Anal..
[89] Luke Oakden-Rayner,et al. Exploring large scale public medical image datasets , 2019, Academic radiology.
[90] L. Joskowicz,et al. Inter-observer variability of manual contour delineation of structures in CT , 2018, European Radiology.
[91] Aaron Carass,et al. Why rankings of biomedical image analysis competitions should be interpreted with care , 2018, Nature Communications.