Prediction based on averages over automatically induced learners ensemble methods and Bayesian techniques

Ensemble methods and Bayesian techniques are two learning paradigms that can be useful to alleviate the di?culties associated with automatic induction from a limited amount of data in the presence of noise. Instead of considering a single hypothesis for prediction, these methods take into account the outputs of a collection of hypotheses compatible with the observed data. Averaging the predictions of di?erent learners provides a mechanism to produce more accurate and robust decisions. However, the practical use of ensembles and Bayesian techniques in machine learning presents some complications. Speci?cally, ensemble methods have large storage requirements. The predictors of the ensemble need to be kept in memory so that they can be readily accessed. Furthermore, computing the ?nal ensemble decision requires querying every predictor in the ensemble. Thus, the prediction cost increases linearly with the ensemble size. In general, it is also di?cult to estimate an appropriate value for the size of the ensemble. On the other hand, Bayesian approaches require the evaluation of multi-dimensional integrals or summations with an exponentially large number of terms that are often intractable. In practice, these calculations are made using approximate algorithms that can be computationally expensive. This thesis addresses some of these shortcomings and proposes novel applications of ensemble methods and Bayesian techniques in supervised learning tasks of practical interest. In the ?rst part of this thesis we analyze di?erent pruning methods that reduce the memory requirements and prediction times of ensembles. These methods replace the original ensemble by a subensemble with good generalization properties. We show that identifying the subensemble that is optimal in terms of the training error is possible only in regression ensembles of intermediate size. For larger ensembles two approximate methods are analyzed: ordered aggregation and SDP-pruning. Both SDP-pruning and ordered aggregation select subensembles that outperform the original ensemble. In classi?cation ensembles it is possible to make inference about the ?nal ensemble prediction by querying only a fraction of the total classi?ers in the ensemble. This is the basis of a novel ensemble pruning method: instance-based (IB) pruning. IB-pruning produces a large speed-up of the classi?cation process without signi?cantly deteriorating the generalization performance of the ensemble. This part of the thesis also describes a statistical procedure for determining an adequate size for the ensemble. The probabilistic framework introduced in IBpruning can be used to infer the size of a classi?cation ensemble so that the resulting ensemble predicts the same class label as an ensemble of in?nite size with a speci?ed con?dence level. The second part of this thesis proposes novel applications of Bayesian techniques with a focus on computational e?ciency. Speci?cally, the expectation propagation (EP) algorithm is used as an alternative to more computationally expensive methods such as Markov chain Monte Carlo or type-II maximum likelihood estimation. In this part of the thesis we introduce the Bayes machine for binary classi?cation. In this Bayesian classi?er the posterior distribution of a parameter that quanti?es the level of noise in the class labels is inferred from the data. This posterior distribution can be e?ciently approximated using the EP algorithm. When EP is used to compute the approximation, the Bayes machine does not require any re-training to estimate this parameter. The cost of training the Bayes machine can be further reduced using a sparse representation. This representation is found by a greedy algorithm whose performance is improved by considering additional re?ning iterations. Finally, we show that EP can be used to approximate the posterior distribution of a Bayesian model for the classi?cation of microarray data. The EP algorithm signi?cantly reduces the training cost of this model and is useful to identify relevant genes for subsequent analysis.

[1]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[2]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[3]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[4]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[5]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[6]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[7]  Grigorios Tsoumakas,et al.  Ensemble Pruning Using Reinforcement Learning , 2006, SETN.

[8]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[12]  Ole Winther,et al.  TAP Gibbs Free Energy, Belief Propagation and Sparsity , 2001, NIPS.

[13]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[14]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[15]  Robert P. W. Duin,et al.  Combining Feature Subsets in Feature Selection , 2005, Multiple Classifier Systems.

[16]  M. Kamel,et al.  Voting schemes for cooperative neural network classifiers , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[17]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[18]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..

[19]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[20]  Daniel Hernández-Lobato,et al.  Pruning Adaptive Boosting Ensembles by Means of a Genetic Algorithm , 2006, IDEAL.

[21]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[22]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[23]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[24]  Michael Biehl,et al.  On-line Learning in Neural Networks , 1998 .

[25]  Ludmila I. Kuncheva,et al.  Genetic Algorithm for Feature Selection for Parallel Classifiers , 1993, Inf. Process. Lett..

[26]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[27]  Zoran Obradovic,et al.  Effective pruning of neural network classifier ensembles , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[28]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[29]  Gang Fu,et al.  Decision Combination of Multiple Classifiers , 2008, Int. J. Pattern Recognit. Artif. Intell..

[30]  Daniel Hernández-Lobato,et al.  Selection of Decision Stumps in Bagging Ensembles , 2007, ICANN.

[31]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[33]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[35]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[38]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[39]  Lefteris Angelis,et al.  Selective fusion of heterogeneous classifiers , 2005, Intell. Data Anal..

[40]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[41]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[42]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[43]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[44]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[45]  Roger E Bumgarner,et al.  Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas. , 1999, Gene.

[46]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  L. Breiman Arcing Classifiers , 1998 .

[49]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[50]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[51]  Grigorios Tsoumakas,et al.  Greedy regression ensemble selection: Theory and an application to water quality prediction , 2008, Inf. Sci..

[52]  Nikunj C. Oza,et al.  Decimated input ensembles for improved generalization , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[53]  Alberto Suárez,et al.  Aggregation Ordering in Bagging , 2004 .

[54]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[55]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[57]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[58]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[60]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[61]  Daniel Hernández-Lobato,et al.  Class-switching neural network ensembles , 2008, Neurocomputing.

[62]  O. Vorobyev,et al.  Discrete multivariate distributions , 2008, 0811.0406.

[63]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[64]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[65]  Daniel Hernández-Lobato,et al.  Pruning in Ordered Regression Bagging Ensembles , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[66]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[67]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[68]  Yoav Freund,et al.  A more robust boosting algorithm , 2009, 0905.2138.

[69]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[70]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[71]  D. Bahler,et al.  Methods for Combining Heterogeneous Sets of Classiers , 2000 .

[72]  T. Heskes,et al.  Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.

[73]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[74]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Laurent Heutte,et al.  Influence of Hyperparameters on Random Forest Accuracy , 2009, MCS.

[76]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[77]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[78]  P. Hall,et al.  Properties of bagged nearest neighbour classifiers , 2005 .

[79]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Arnold J Stromberg,et al.  Subsampling , 2001, Technometrics.

[82]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[83]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[84]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[85]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[86]  Yuan Qi,et al.  Virtual Vector Machine for Bayesian Online Classification , 2009, UAI.

[87]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[88]  Fabio Roli,et al.  Design of effective multiple classifier systems by clustering of classifiers , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[89]  Mykola Pechenizkiy,et al.  Dynamic Integration of Classifiers in the Space of Principal Components , 2003, ADBIS.

[90]  Shlomo Argamon,et al.  Arbitrating Among Competing Classifiers Using Learned Referees , 2001, Knowledge and Information Systems.

[91]  Aníbal R. Figueiras-Vidal,et al.  Boosting by weighting critical and erroneous samples , 2006, Neurocomputing.

[92]  Patrick P. K. Chan,et al.  Neural network ensemble pruning using sensitivity measure in web applications , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[93]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[94]  Noel E. Sharkey,et al.  Combining diverse neural nets , 1997, The Knowledge Engineering Review.

[95]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[96]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[97]  Cigdem Demir,et al.  Cost-conscious classifier ensembles , 2005, Pattern Recognit. Lett..

[98]  Gene H. Golub,et al.  Methods for modifying matrix factorizations , 1972, Milestones in Matrix Computation.

[99]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[100]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[102]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[103]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[104]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[105]  Yi Li,et al.  Bayesian automatic relevance determination algorithms for classifying gene expression data. , 2002, Bioinformatics.

[106]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[107]  Kaspar Althoefer,et al.  Modelling conditional probabilities with network committees: how overfitting can be useful , 1998 .

[108]  Huanhuan Chen,et al.  A Probabilistic Ensemble Pruning Algorithm , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[109]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[110]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[111]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[112]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[113]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[114]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[115]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[116]  D. Obradovic,et al.  Combining Artificial Neural Nets , 1999, Perspectives in Neural Computing.

[117]  Tom Bylander,et al.  Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates , 2002, Machine Learning.

[118]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[119]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[120]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[121]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[122]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[123]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[124]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[125]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[126]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[127]  Lorenza Saitta,et al.  Monte Carlo theory as an explanation of bagging and boosting , 2003, IJCAI 2003.

[128]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[129]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[130]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[131]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[132]  Lawrence Carin,et al.  Joint Classifier and Feature Optimization for Comprehensive Cancer Diagnosis Using Gene Expression Data , 2004, J. Comput. Biol..

[133]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[134]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[135]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[136]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[137]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[138]  Daniel Hernández-Lobato,et al.  Out of Bootstrap Estimation of Generalization Error Curves in Bagging Ensembles , 2007, IDEAL.

[139]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[140]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[141]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[142]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[143]  A. Dawid,et al.  A comparison of sequential learning methods for incomplete data , 1995 .

[144]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[145]  Nathan Intrator,et al.  Boosting Regression Estimators , 1999, Neural Computation.

[146]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[147]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[148]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[149]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[150]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[151]  P. Bühlmann Bagging, subagging and bragging for improving some prediction algorithms , 2003 .

[152]  Tom Heskes,et al.  Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach , 2007, NIPS.

[153]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[154]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[155]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[156]  Aníbal R. Figueiras-Vidal,et al.  A Dynamically Adjusted Mixed Emphasis Method for Building Boosting Ensembles , 2008, IEEE Transactions on Neural Networks.

[157]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[158]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[159]  Kagan Tumer,et al.  Input decimated ensembles , 2003, Pattern Analysis & Applications.

[160]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[161]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[162]  Amanda J. C. Sharkey,et al.  Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems , 1999 .

[163]  Anne M. P. Canuto,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007, Pattern Recognit. Lett..

[164]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[165]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[166]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[167]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[168]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[169]  Alexey Tsymbal,et al.  Bagging and Boosting with Dynamic Integration of Classifiers , 2000, PKDD.

[170]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[171]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[172]  Daniel Hernández-Lobato,et al.  Bayes Machines for binary classification , 2008, Pattern Recognit. Lett..

[173]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[174]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[175]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[176]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[177]  Anand M. Narasimhamurthy Theoretical bounds of majority voting performance for a binary classification problem , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[178]  S. Adler Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions , 1981 .

[179]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[180]  Alexey Tsymbal,et al.  A Dynamic Integration Algorithm for an Ensemble of Classifiers , 1999, ISMIS.

[181]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[182]  Grigorios Tsoumakas,et al.  Effective Voting of Heterogeneous Classifiers , 2004, ECML.

[183]  Jean-Philippe Thiran,et al.  Information Theoretic Combination of Classifiers with Application to AdaBoost , 2007, MCS.

[184]  M. Opper Sparse Online Gaussian Processes , 2008 .

[185]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[186]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[187]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[188]  Thomas G. Dietterich,et al.  Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms , 2008 .

[189]  Wei Pan,et al.  Network-based support vector machine for classification of microarray samples , 2009, BMC Bioinformatics.

[190]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[191]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[192]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[193]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[194]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[195]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[196]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[197]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[198]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[199]  Daniel Hernández-Lobato,et al.  Statistical Instance-Based Pruning in Ensembles of Independent Classifiers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[200]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[201]  J. Kittler,et al.  Multistage pattern recognition with reject option , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[202]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[203]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[204]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[205]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[206]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[207]  Salvatore J. Stolfo,et al.  Cost Complexity-Based Pruning of Ensemble Classifiers , 2001, Knowledge and Information Systems.

[208]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[209]  Philip S. Yu,et al.  Pruning and dynamic scheduling of cost-sensitive ensembles , 2002, AAAI/IAAI.

[210]  Jiawei Zhang,et al.  An improved rounding method and semidefinite programming relaxation for graph partition , 2002, Math. Program..

[211]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[212]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[213]  Tom Heskes,et al.  Clustering ensembles of neural network models , 2003, Neural Networks.

[214]  Aravind Subramanian,et al.  Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[215]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[216]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[217]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[218]  Gonzalo Martínez-Muñoz,et al.  Switching class labels to generate classification ensembles , 2005, Pattern Recognit..

[219]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[220]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[221]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[222]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[223]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[224]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[225]  Daniel Hernández-Lobato Sparse Bayes Machines for Binary Classification , 2008, ICANN.

[226]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[227]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[228]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[229]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[230]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[231]  Daniel Hernández-Lobato,et al.  Expectation Propagation for microarray data classification , 2010, Pattern Recognit. Lett..

[232]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[233]  Ethem Alpaydin,et al.  Cascading classifiers , 1998, Kybernetika.

[234]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[235]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[236]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[237]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[238]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.