Particle Swarm Model Selection

This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains the lowest classification error for a given data set; the task also includes the selection of hyperparameters for the considered methods. This problem generates a vast search space to be explored, well suited for stochastic optimization techniques. FMS can be applied to any classification domain as it does not require domain knowledge. Different model types and a variety of algorithms can be considered under this formulation. Furthermore, competitive yet simple models can be obtained with FMS. We adopt PSO for the search because of its proven performance in different problems and because of its simplicity, since neither expensive computations nor complicated operations are needed. Interestingly, the way the search is guided allows PSO to avoid overfitting to some extend. Experimental results on benchmark data sets give evidence that the proposed approach is very effective, despite its simplicity. Furthermore, results obtained in the framework of a model selection challenge show the competitiveness of the models selected with PSO, compared to models selected with other techniques that focus on a single algorithm and that use domain knowledge.

[1]  Gavin C. Cawley,et al.  Generalised Kernel Machines , 2007, 2007 International Joint Conference on Neural Networks.

[2]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[3]  James Kenedy How It Works: Collaborative Trial and Error , 2008 .

[4]  Chilukuri K. Mohan,et al.  Analysis of a simple particle swarm optimization system , 1998 .

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  M. S. Voss,et al.  ARMA MODEL SELECTION USING PARTICLE SWARM OPTIMIZATION AND AIC CRITERIA , 2002 .

[7]  Isabelle Guyon,et al.  Analysis of the IJCNN 2007 agnostic learning vs. prior knowledge challenge , 2008, Neural Networks.

[8]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[9]  Virginia Torczon,et al.  DERIVATIVE-FREE PATTERN SEARCH METHODS FOR MULTIDISCIPLINARY DESIGN PROBLEMS , 1994 .

[10]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[11]  H. Yoshida,et al.  A particle swarm optimization for reactive power and voltage control considering voltage security assessment , 1999, 2001 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No.01CH37194).

[12]  Xiaohui Hu,et al.  Engineering optimization with particle swarm , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[13]  Y. Rahmat-Samii,et al.  Particle swarm optimization in electromagnetics , 2004, IEEE Transactions on Antennas and Propagation.

[14]  Roman W. Lutz,et al.  LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[15]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[16]  Gavin C. Cawley,et al.  Agnostic Learning versus Prior Knowledge in the Design of Kernel Machines , 2007, 2007 International Joint Conference on Neural Networks.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  O. Nelles Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models , 2000 .

[20]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  J. Salerno,et al.  Using the particle swarm optimization technique to train a recurrent neural model , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[23]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[24]  Paul R. Cohen,et al.  Multiple Comparisons in Induction Algorithms , 2000, Machine Learning.

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  Peter J. Angeline,et al.  Evolutionary Optimization Versus Particle Swarm Optimization: Philosophy and Performance Differences , 1998, Evolutionary Programming.

[27]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[28]  J. Kennedy,et al.  Population structure and particle swarm performance , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[29]  Padraig Cunningham,et al.  Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets , 2004, SGAI Conf..

[30]  Carlos A. Coello Coello,et al.  On the use of a population-based particle swarm optimizer to design combinational logic circuits , 2004, Proceedings. 2004 NASA/DoD Conference on Evolvable Hardware, 2004..

[31]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[32]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[33]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[34]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[35]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[36]  I. Guyon,et al.  Performance Prediction Challenge , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[37]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[38]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[39]  Isabelle Guyon,et al.  Agnostic Learning vs. Prior Knowledge Challenge , 2007, 2007 International Joint Conference on Neural Networks.

[40]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[41]  Juha Reunanen,et al.  Model Selection and Assessment Using Cross-indexing , 2007, 2007 International Joint Conference on Neural Networks.

[42]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[43]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[44]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[45]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[46]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[47]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[48]  Marc Boullé,et al.  A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning , 2007, J. Mach. Learn. Res..

[49]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[50]  Marc Boullé,et al.  Report on Preliminary Experiments with Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge , 2007, 2007 International Joint Conference on Neural Networks.

[51]  Yoshikazu Fukuyama,et al.  A particle swarm optimization for reactive power and voltage control considering voltage security assessment , 2000 .

[52]  Jörg D. Wichard,et al.  Agnostic Learning with Ensembles of Classifiers , 2007, 2007 International Joint Conference on Neural Networks.

[53]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[54]  Hugo Jair Escalante,et al.  Joint Conference on Neural Networks , Orlando , Florida , USA , August 12-17 , 2007 PSMS for Neural Networks on the IJCNN 2007 Agnostic vs Prior Knowledge Challenge , 2007 .

[55]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[56]  Russell C. Eberhart,et al.  Parameter Selection in Particle Swarm Optimization , 1998, Evolutionary Programming.

[57]  Dirk Gorissen,et al.  Automatic model type selection with heterogeneous evolution: An application to RF circuit block modeling , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[58]  Erinija Pranckeviciene,et al.  Feature/Model Selection by the Linear Programming SVM Combined with State-of-Art Classifiers: What Can We Learn About the Data , 2007, 2007 International Joint Conference on Neural Networks.

[59]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[60]  M Reyes Sierra,et al.  Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art , 2006 .

[61]  Andries Petrus Engelbrecht,et al.  Fundamentals of Computational Swarm Intelligence , 2005 .

[62]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.