exprso: an R-package for the rapid implementation of machine learning algorithms

Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce exprso, a new R package that is an intuitive machine learning suite designed specifically for non-expert programmers. Built initially for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso also supports multi-class classification (through the 1-vs-all generalization of binary classifiers) and the prediction of continuous outcomes.

[2]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[5]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[6]  Benjamin Haibe-Kains,et al.  mRMRe: an R package for parallelized mRMR ensemble feature selection , 2013, Bioinform..

[7]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[8]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[9]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[10]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[11]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[12]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[15]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[16]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[17]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[18]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .