Feature selection and ensemble of regression models for predicting the protein macromolecule dissolution profile

Predicting the dissolution rate of proteins plays a significant role in pharmaceutical/medical applications. The rate of dissolution of Poly Lactic-co-Glycolic Acid (PLGA) micro- and nanoparticles is influenced by several factors. Considering all factors leads to a dataset with three hundred features, making the prediction difficult and inaccurate. Our present study consists of three phases. Firstly, dimensionality reduction techniques are applied in order to simplify the task and eliminate irrelevant and redundant attributes. Subsequently, a heterogeneous pool of several classical regression algorithms is created and evaluated. Regression algorithms in the pool are independently trained to identify the problem at hand. Finally, we test several ensemble methods in order to elevate the accuracy of the prediction. The Evolutionary Weighted Ensemble method proposed in this paper offered the lowest RMSE and significantly outperformed competing classical algorithms and other ensemble techniques.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[5]  Juergen Siepmann,et al.  A New Mathematical Model Quantifying Drug Release from Bioerodible Microparticles Using Monte Carlo Simulations , 2002, Pharmaceutical Research.

[6]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[7]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[8]  María J. Alonso,et al.  Development and characterization of protein-loaded poly(lactide-co-glycolide) nanospheres , 1997 .

[9]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  Jakub Szlęk,et al.  Heuristic modeling of macromolecule release from PLGA microspheres , 2013, International journal of nanomedicine.

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[14]  K Zygourakis,et al.  Computer-aided design of bioerodible devices with optimal release characteristics: a cellular automata approach. , 1996, Biomaterials.

[15]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[16]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[17]  Steven P Schwendeman,et al.  Pore closing and opening in biodegradable polymers and their effect on the controlled release of proteins. , 2007, Molecular pharmaceutics.

[18]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[19]  C. Astete,et al.  Synthesis and characterization of PLGA nanoparticles , 2006, Journal of biomaterials science. Polymer edition.

[20]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[21]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[22]  A. Göpferich,et al.  Mechanisms of polymer degradation and erosion. , 1996, Biomaterials.

[23]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[26]  Anders Axelsson,et al.  The mechanisms of drug release in poly(lactic-co-glycolic acid)-based drug delivery systems--a review. , 2011, International journal of pharmaceutics.

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Rubiana M Mainardes,et al.  PLGA nanoparticles containing praziquantel: effect of formulation variables on size distribution. , 2005, International journal of pharmaceutics.

[30]  Oliver Lambert,et al.  Stability of proteins encapsulated in injectable and biodegradable poly(lactide-co-glycolide)-glucose millicylinders. , 2008, International journal of pharmaceutics.

[31]  W. N. H. W. Mohamed,et al.  A comparative study of Reduced Error Pruning method in decision tree algorithms , 2012, 2012 IEEE International Conference on Control System, Computing and Engineering.