Parameter Tuning Using Gaussian Processes

Most machine learning algorithms require us to set up their parameter values before applying these algorithms to solve problems. Appropriate parameter settings will bring good performance while inappropriate parameter settings generally result in poor modelling. Hence, it is necessary to acquire the “best” parameter values for a particular algorithm before building the model. The “best” model not only reflects the “real” function and is well fitted to existing points, but also gives good performance when making predictions for new points with previously unseen values. A number of methods exist that have been proposed to optimize parameter values. The basic idea of all such methods is a trial-and-error process whereas the work presented in this thesis employs Gaussian process (GP) regression to optimize the parameter values of a given machine learning algorithm. In this thesis, we consider the optimization of only two-parameter learning algorithms. All the possible parameter values are specified in a 2-dimensional grid in this work. To avoid brute-force search, Gaussian Process Optimization (GPO) makes use of “expected improvement” to pick useful points rather than validating every point of the grid step by step. The point with the highest expected improvement is evaluated using cross-validation and the resulting data point is added to the training set for the Gaussian process model. This process is repeated until a stopping criterion is met. The final model is built using the learning algorithm based on the best parameter values identified in this process. In order to test the effectiveness of this optimization method on regression and classification problems, we use it to optimize parameters of some well-known machine learning algorithms, such as decision tree learning, support vector machines and boosting with trees. Through the analysis of experimental results i obtained on datasets from the UCI repository, we find that the GPO algorithm yields competitive performance compared with a brute-force approach, while exhibiting a distinct advantage in terms of training time and number of cross-validation runs. Overall, the GPO method is a promising method for the optimization of parameter values in machine learning.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Shih-Wei Lin,et al.  Parameter determination and feature selection for C4.5 algorithm using scatter search approach , 2012, Soft Comput..

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[6]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[9]  Lise Getoor,et al.  Predicting Protein-Protein Interactions Using Relational Features , 2007 .

[10]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[14]  Yusuke Tanaka,et al.  Cross-Validation, Bootstrap, and Support Vector Machines , 2011, Adv. Artif. Neural Syst..

[15]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[16]  Marcus R. Frean,et al.  Using Gaussian Processes to Optimize Expensive Functions , 2008, Australasian Conference on Artificial Intelligence.

[17]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[18]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[19]  Charles P. Staelin Parameter selection for support vector machines , 2002 .

[20]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[21]  Bhasker Pant,et al.  Decision Tree Classifier for Classification of Plant and Animal Micro RNA’s , 2009 .

[22]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[23]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[24]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[25]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[26]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[27]  Amit Thakkar,et al.  Boost a Weak Learner to a Strong Learner Using Ensemble System Approach , 2009, 2009 IEEE International Advance Computing Conference.

[28]  George H. John Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection , 1994 .

[29]  Roman W. Lutz,et al.  LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[30]  Geoffrey Holmes,et al.  Predicting polycyclic aromatic hydrocarbon concentrations in soil and water samples , 2010 .

[31]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[32]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[33]  Michael J. Shaw,et al.  Inductive learning for risk classification , 1990, IEEE Expert.

[34]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[35]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[36]  Ashish Sureka,et al.  Using Genetic Algorithms for Parameter Optimization in Building Predictive Data Mining Models , 2008, ADMA.

[37]  Iain Murray Introduction To Gaussian Processes , 2008 .

[38]  A. Zell,et al.  Efficient parameter selection for support vector machines in classification and regression via model-based global optimization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[39]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[40]  I. Hatono,et al.  Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[41]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[42]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[43]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[44]  Geoff Holmes,et al.  Analysing chromatographic data using data mining to monitor petroleum content in water , 2009, ITEE.

[45]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[46]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.