A neural network-based biomarker association information extraction approach for cancer classification

A number of different approaches based on high-throughput data have been developed for cancer classification. However, these methods often ignore the underlying correlation between the expression levels of different biomarkers which are related to cancer. From a biological viewpoint, the modeling of these abnormal associations between biomarkers will play an important role in cancer classification. In this paper, we propose an approach based on the concept of Biomarker Association Networks (BAN) for cancer classification. The BAN is modeled as a neural network, which can capture the associations between the biomarkers by minimizing an energy function. Based on the BAN, a new cancer classification approach is developed. We validate the proposed approach on four publicly available biomarker expression datasets. The derived Biomarker Association Networks are observed to be significantly different for different cancer classes, which help reveal the underlying deviant biomarker association patterns responsible for different cancer types. Extensive comparisons show the superior performance of the BAN-based classification approach over several conventional classification methods.

[1]  Alexander Schliep,et al.  Inferring differentiation pathways from gene expression , 2008, ISMB.

[2]  K. J. Ray Liu,et al.  Ensemble dependence model for classification and prediction of cancer and normal gene expression data , 2005, Bioinform..

[3]  松井 啓隆,et al.  Reduced effect of gemtuzumab ozogamicin (CMA-676) on P-glycoprotein and/or CD34-positive leukemia cells and its restoration by multidrug resistance modifiers , 2003 .

[4]  Kelly M. McGarvey,et al.  The cancer epigenome--components and functional correlates. , 2006, Genes & development.

[5]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[6]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[7]  Daniel R. Richards,et al.  Erratum: A network-based analysis of systemic inflammation in humans (Nature (2005) 437 (1032-1037) DOI: 10.1038/nature03985) , 2005 .

[8]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[9]  Hau-San Wong,et al.  Extracting gene regulation information for cancer classification , 2007, Pattern Recognit..

[10]  Robert Tibshirani,et al.  Disease-specific genomic analysis: identifying the signature of pathologic biology , 2007, Bioinform..

[11]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[12]  M.N.S. Swamy,et al.  Neural networks in a softcomputing framework , 2006 .

[13]  Robert Veroff,et al.  A Bayesian Network Classification Methodology for Gene Expression Data , 2004, J. Comput. Biol..

[14]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[15]  J. Bach,et al.  Thymic hormone containing cells. II. Evolution of cells containing the serum thymic factor (FTS or thymulin) in normal and autoimmune mice, as revealed by anti-FTS monoclonal antibodies. Relationship with Ia bearing cells. , 1983, Clinical and experimental immunology.

[16]  Li Shang,et al.  Feature selection in independent component subspace for microarray data classification , 2006, Neurocomputing.

[17]  Francesca Martella,et al.  Classification of microarray data with factor mixture models , 2006, Bioinform..

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Carsten Peterson,et al.  Classification and diagnostic prediction of pediatric cancers using gene expression profiling and artificial neural networks , 2002 .

[20]  K.J.R. Liu,et al.  Genomic processing for cancer classification and prediction - Abroad review of the recent advances in model-based genomoric and proteomic signal processing for cancer detection , 2007, IEEE Signal Processing Magazine.

[21]  Adrian Wiestner,et al.  A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Li Song,et al.  Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect , 2003, BMC Bioinformatics.

[25]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[26]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[27]  박귀태,et al.  SVM(Support Vector Machine)을 이용한 환경 및 생태 시스템에서 효율적인 BS(Base Station) 설정 , 2009 .

[28]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[29]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[30]  T. Yip,et al.  ProteinChip array profiling for identification of disease- and chemotherapy-associated biomarkers of nasopharyngeal carcinoma. , 2007, Clinical chemistry.

[31]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[32]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[33]  Edward R. Dougherty,et al.  From Boolean to probabilistic Boolean networks as models of genetic regulatory networks , 2002, Proc. IEEE.

[34]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[35]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[37]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[38]  Tsz-Kwong Man,et al.  Expression profiles of osteosarcoma that can predict response to chemotherapy. , 2005, Cancer research.

[39]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[40]  J. Bach,et al.  Thymic hormone containing cells. III. Evidence for a feed-back regulation of the secretion of the serum thymic factor (FTS) by thymic epithelial cells. , 1983, Clinical and experimental immunology.

[41]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[42]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[43]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[44]  Jiyuan An,et al.  Finding Rule Groups to Classify High Dimensional Gene Expression Datasets , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[45]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[46]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[47]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[48]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[49]  M. Q. Zhang Large-scale gene expression data analysis: a new challenge to computational biologists. , 1999, Genome research.

[50]  Kezhi Mao,et al.  Feature subset selection for support vector machines through discriminative function pruning analysis , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[51]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[52]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[53]  R Dulbecco,et al.  A turning point in cancer research: sequencing the human genome. , 1986, Science.

[54]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[55]  J. K. Bertrand,et al.  The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification. , 2007, Mathematical medicine and biology : a journal of the IMA.

[56]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[57]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[58]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[59]  Igor V. Tetko,et al.  Optimization models for cancer classification: extracting gene interaction information from microarray expression data , 2004, Bioinform..

[60]  Kaushik Mahata,et al.  Selecting differentially expressed genes using minimum probability of classification error , 2007, J. Biomed. Informatics.

[61]  Liu Cc,et al.  (Nucleic Acids Res., 34:4069-4080)Topology-based cancer classification and related pathway mining using microarray data , 2006 .

[62]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[63]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[64]  Gengfeng Wu,et al.  Dimension reduction with redundant gene elimination for tumor classification , 2008, BMC Bioinformatics.

[65]  Jiyuan An,et al.  Finding Rule Groups to Classify High Dimensional Gene Expression Datasets , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[66]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[67]  De-shuang Huang,et al.  Optimisation of radial basis function classifiers using simulated annealing algorithm for cancer classification , 2005 .

[68]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[69]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[70]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[71]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[72]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[73]  Jeremy J. W. Chen,et al.  Topology-based cancer classification and related pathway mining using microarray data , 2006, Nucleic acids research.

[74]  R Ohno,et al.  Reduced effect of gemtuzumab ozogamicin (CMA-676) on P-glycoprotein and/or CD34-positive leukemia cells and its restoration by multidrug resistance modifiers , 2002, Leukemia.

[75]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  I. Mian,et al.  Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. , 2001, Physiological genomics.

[77]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[78]  Thea D. Tlsty,et al.  Cancer: Whispering sweet somethings , 2008, Nature.

[79]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[80]  Yong Xu,et al.  Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[81]  Guo-Zheng Li,et al.  Improving prediction accuracy of tumor classification by reusing genes discarded during gene selection , 2008, BMC Genomics.

[82]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.