A Review of Cancer Classification Software for Gene Expression Data

Microarray technology provides a way for researchers to measure the expression level of thousands of genes simultaneously in a single experiment. Due to the increasing amount of microarray data, the field of microarray data analysis has become a major topic among researchers. One of the examples of microarray data analysis is classification. Classification is the process of determining the classes for samples. The goal of classification is to identify the differentially expressed genes so that these genes can be used to predict the classes for new samples. In order to perform the tasks of classification of microarray data, classification software is required for effective classification and analysis of large-scale data. This paper reviews numerous classification software applications for gene expression data. In this paper, the reviewed software can be categorized into six supervised classification methods: Support Vector Machine, K-Nearest Neighbour, Neural Network, Linear Discriminant Analysis, Bayesian Classifier, and Random Forest.

[1]  Ljubomir J. Buturovic,et al.  PCP: a program for supervised classification of gene expression profiles , 2006, Bioinform..

[2]  Stefan Fritsch,et al.  neuralnet: Training of Neural Networks , 2010, R J..

[3]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[4]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[5]  Harry Zhang,et al.  Full Bayesian network classifiers , 2006, ICML.

[6]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[7]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[8]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  Yann Guermeur,et al.  MSVMpack: A Multi-Class Support Vector Machine Package , 2011, J. Mach. Learn. Res..

[10]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[11]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12]  Emmanuel Monfrini,et al.  A Quadratic Loss Multi-Class SVM for which a Radius-Margin Bound Applies , 2011, Informatica.

[13]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  K. Hornik,et al.  A Laboratory for Recursive Partytioning , 2015 .

[17]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[18]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[19]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[20]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[21]  Qiang Yang,et al.  SVM: Support Vector Machines , 2011 .

[22]  José Manuel Benítez,et al.  Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS , 2012 .

[23]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[24]  E. Falkenauer,et al.  Using k-Means ? Consider ArrayMiner , 2001 .

[25]  S. Lovell,et al.  Bioinformatics: from molecules to systems. A Discussion Meeting held at The Royal Society on 4 and 5 April 2005 , 2005, Journal of The Royal Society Interface.

[26]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[28]  Renfa Li,et al.  A Hybrid Gene Selection Method for Cancer Classification Based on Clustering Algorithm and Euclidean Distance , 2012 .

[29]  George C. Runger,et al.  Gene selection with guided regularized random forest , 2012, Pattern Recognit..

[30]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[31]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[32]  Donald A. Adjeroh,et al.  Random KNN feature selection - a fast and stable alternative to Random Forests , 2011, BMC Bioinformatics.

[33]  Gary William Flake,et al.  Efficient SVM Regression Training with SMO , 2002, Machine Learning.

[34]  Alok Bhattacharya,et al.  Computational biology: More than just a set of techniques , 2007 .

[35]  Yingdong Zhao,et al.  Analysis of Gene Expression Data Using BRB-Array Tools , 2007, Cancer informatics.

[36]  Yang Ai-jun,et al.  Bayesian variable selection for disease classification using gene expression data , 2010 .

[37]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[38]  Kay Nieselt,et al.  Mayday-a microarray data analysis workbench , 2006, Bioinform..

[39]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .

[40]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[41]  Kay Nieselt,et al.  Mayday - integrative analytics for expression data , 2010, BMC Bioinformatics.

[42]  Ramón Díaz-Uriarte,et al.  GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest , 2007, BMC Bioinformatics.

[43]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[44]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[45]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[46]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[47]  Donald Adjeroh,et al.  Random knn modeling and variable selection for high dimensional data , 2009 .

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Jim E. Griffin,et al.  Transdimensional Sampling Algorithms for Bayesian Variable Selection in Classification Problems With Many More Variables Than Observations , 2009 .