A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data

A hybrid Huberized support vector machine (HHSVM) with an elastic-net penalty has been developed for cancer tumor classification based on thousands of gene expression measurements. In this paper, we develop a Bayesian formulation of the hybrid Huberized support vector machine for binary classification. For the coefficients of the linear classification boundary, we propose a new type of prior, which can select variables and group them together simultaneously. Our proposed prior is a scale mixture of normal distributions and independent gamma priors on a transformation of the variance of the normal distributions. We establish a direct connection between the Bayesian HHSVM model with our special prior and the standard HHSVM solution with the elastic-net penalty. We propose a hierarchical Bayes technique and an empirical Bayes technique to select the penalty parameter. In the hierarchical Bayes model, the penalty parameter is selected using a beta prior. For the empirical Bayes model, we estimate the penalty parameter by maximizing the marginal likelihood. The proposed model is applied to two simulated data sets and three real-life gene expression microarray data sets. Results suggest that our Bayesian models are highly successful in selecting groups of similarly behaved important genes and predicting the cancer class. Most of the genes selected by our models have shown strong association with well-studied genetic pathways, further validating our claims.

[1]  Clifford S. Deutschman,et al.  Transcription , 2003, The Quran: Word List (Volume 3).

[2]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[3]  B. Mallick,et al.  Bayesian classification of tumors using gene expression data , 2004 .

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  Zixiang Xiong,et al.  Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[6]  Byung-Gyu Kim,et al.  Proteomic profiling of endothelial cells in human lung cancer. , 2008, Journal of proteome research.

[7]  Sylvia Richardson,et al.  Inference and monitoring convergence , 1995 .

[8]  Sakari Knuutila,et al.  Lymphotoxin β expression is high in chronic lymphocytic leukemia but low in small lymphocytic lymphoma: a quantitative real-time reverse transcriptase polymerase chain reaction analysis , 2003 .

[9]  Corinne Antignac,et al.  Expression of the nonmuscle myosin heavy chain IIA in the human kidney and screening for MYH9 mutations in Epstein and Fechtner syndromes. , 2002, Journal of the American Society of Nephrology : JASN.

[10]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[11]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[12]  Antonio Facchiano,et al.  The murine Tcl1 oncogene: embryonic and lymphoid cell expression , 1997, Oncogene.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  B. Mallick,et al.  Bayesian classification of tumours by using gene expression data , 2005 .

[15]  Valeriy Filonenko,et al.  Immunohistochemical analysis of Ki-67, PCNA and S6K1/2 expression in human breast cancer. , 2005, Experimental oncology.

[16]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[17]  G. Wahba,et al.  Optimal Properties and Adaptive Tuning of Standard and Nonstandard Support Vector Machines , 2003 .

[18]  Stuart G. Baker,et al.  Identifying genes that contribute most to good classification in microarrays , 2006, BMC Bioinformatics.

[19]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  G. Wahba Spline models for observational data , 1990 .

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[26]  Julia Thom Oxford,et al.  Immunohistochemical Localization of Collagen Type XI α1 and α2 Chains in Human Colon Tissue , 2008 .

[27]  I. Rosenwald,et al.  Expression of eukaryotic translation initiation factors 4E and 2alpha correlates with the progression of thyroid carcinoma. , 2001, Thyroid : official journal of the American Thyroid Association.

[28]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[29]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[30]  A. E. Alekseev,et al.  Bacterial enterotoxins are associated with resistance to colon cancer , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[32]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[34]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[35]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[36]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[37]  Yadong Wang,et al.  Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements , 2008, BMC Systems Biology.

[38]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[39]  Julia Thom Oxford,et al.  Immunohistochemical localization of collagen type XI alpha1 and alpha2 chains in human colon tissue. , 2008, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[40]  L. Forte,et al.  The guanylin and uroguanylin peptide hormones and their receptors. , 1997, Acta anatomica.

[41]  Stephen G Swisher,et al.  Adenoviral transfer of the melanoma differentiation-associated gene 7 (mda7) induces apoptosis of lung cancer cells via up-regulation of the double-stranded RNA-dependent protein kinase (PKR). , 2002, Cancer research.

[42]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[43]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[44]  S. Mukherjee,et al.  A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. , 2006, The New England journal of medicine.

[45]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[46]  Mingjin Yang,et al.  HDJC9, a novel human type C DnaJ/HSP40 member interacts with and cochaperones HSP70 through the J domain. , 2007, Biochemical and biophysical research communications.

[47]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[48]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[49]  Sounak Chakraborty,et al.  Computational Statistics and Data Analysis Bayesian Binary Kernel Probit Model for Microarray Based Cancer Classification and Gene Selection , 2022 .

[50]  Katsuyoshi Hatakeyama,et al.  Zyxin, a Regulator of Actin Filament Assembly, Targets the Mitotic Apparatus by Interacting with H-Warts/Lats1 Tumor Suppressor , 2000, The Journal of cell biology.

[51]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[52]  Christopher H Schmid,et al.  Factors other than glomerular filtration rate affect serum cystatin C levels. , 2009, Kidney international.