论文信息 - Information Visualizations Used to Avoid the Problem of Overfitting in Supervised Machine Learning

Information Visualizations Used to Avoid the Problem of Overfitting in Supervised Machine Learning

This paper will look at what types of information graphics and visualizations can support supervised Machine Learning tasks: in essence, how to support the problem of model validation and model overfitting. In particular, I look, graphically, at model performance as a function of model complexity. With an appropriate information graphic, we can visualize at what point the model becomes too complex and starts to deteriorate in performance because of model overfitting. I will look at two actual case studies—the first, a regression task using polynomial regression and the second, a classification problem using neural networks. I create information graphics, in particular fitting graphs, to support the end-user in visualizing which model is the best choice.

Robbie T. Nakatsu

[1] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[2] Hadley Wickham,et al. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data , 2014 .

[3] D. Rubinfeld,et al. Hedonic housing prices and the demand for clean air , 1978 .

[4] Daniela M. Witten,et al. An Introduction to Statistical Learning: with Applications in R , 2013 .

[5] William Nick Street,et al. Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[6] W. W. Muir,et al. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[7] Tom Fawcett,et al. Data science for business , 2013 .

[8] Trevor Hastie,et al. An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.