Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index

Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases.

[1]  M. Linial,et al.  VAT 1: An abundant membrane protein from torpedo cholinergic synaptic vesicles , 1989, Neuron.

[2]  K. Webster,et al.  Positive regulation of the skeletal alpha-actin gene by Fos and Jun in cardiac myocytes. , 1992, The Journal of biological chemistry.

[3]  B. O'dowd,et al.  The identification of NP25: a novel protein that is differentially expressed by neuronal subpopulations. , 1994, Brain research. Molecular brain research.

[4]  A. Hoffbrand,et al.  Inactivation of calcium ion-regulating inositol polyphosphate second messengers is impaired in subpopulations of human leukemia cells. , 1994, Leukemia.

[5]  A. deFazio,et al.  Expression and tyrosine phosphorylation of EMS1 in human breast cancer cell lines , 1996, International journal of cancer.

[6]  Amplification and expression of EMS-1 (cortactin) in head and neck squamous cell carcinoma cell lines. , 1996, Oncogene.

[7]  S. Sheng,et al.  Inhibition of tumor growth and metastasis of human breast cancer cells transfected with tissue inhibitor of metalloproteinase 4 , 1997, Oncogene.

[8]  S. Narumiya,et al.  Signaling from Rho to the actin cytoskeleton through protein kinases ROCK and LIM-kinase. , 1999, Science.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  J. Downing THE AML1‐ETO CHIMAERIC TRANSCRIPTION FACTOR IN ACUTE MYELOID LEUKAEMIA: BIOLOGY AND CLINICAL SIGNIFICANCE , 1999, British journal of haematology.

[11]  N. Heerema,et al.  Expression of dominant-negative and mutant isoforms of the antileukemic transcription factor Ikaros in infant acute lymphoblastic leukemia. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Michael Bustin,et al.  Regulation of DNA-Dependent Activities by the Functional Motifs of the High-Mobility-Group Chromosomal Proteins , 1999, Molecular and Cellular Biology.

[13]  C. Croce,et al.  The FEZ1 gene at chromosome 8p22 encodes a leucine-zipper protein, and its expression is altered in multiple human tumors. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Y. Ihara,et al.  Evidence That Collapsin Response Mediator Protein-2 Is Involved in the Dynamics of Microtubules* , 2000, The Journal of Biological Chemistry.

[15]  Ann-Marie Martoglio,et al.  Changes in Tumorigenesis- and Angiogenesis-related Gene Transcript Abundance Profiles in Ovarian Cancer Detected by Tailored High Density cDNA Arrays , 2000, Molecular medicine.

[16]  P. Lollini,et al.  CD99 engagement: an effective therapeutic strategy for Ewing tumors. , 2000, Cancer research.

[17]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[18]  T. Südhof,et al.  Novel SCAMPs Lacking NPF Repeats: Ubiquitous and Synaptic Vesicle-Specific Forms Implicate SCAMPs in Multiple Membrane-Trafficking Functions , 2000, The Journal of Neuroscience.

[19]  F. Zintl,et al.  Fatty acid composition of lymphocyte membrane phospholipids in children with acute leukemia. , 2001, Cancer letters.

[20]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[22]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[23]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  William Stafford Noble,et al.  Analysis of strain and regional variation in gene expression in mouse brain , 2001, Genome Biology.

[25]  E. Kimura,et al.  Galectin-3 messenger ribonucleic acid and protein are expressed in benign thyroid tumors. , 2002, The Journal of clinical endocrinology and metabolism.

[26]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[27]  Taka-Aki Sato,et al.  A human gene encoding a protein homologous to ribosomal protein L39 is normally expressed in the testis and derepressed in multiple cancer cells. , 2002, Biochimica et biophysica acta.

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[30]  Ø. Bruserud,et al.  Expression of Fc(epsilon)-receptors by human acute myelogenous leukemia (AML) blasts: studies of high- and low- (CD23) affinity receptor expression and the effects of IgE-mediated receptor ligation on functional AML blast characteristics. , 2002, Leukemia research.

[31]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[32]  David A. Williams,et al.  Hematopoietic Cell Regulation by Rac1 and Rac2 Guanosine Triphosphatases , 2003, Science.

[33]  A. Yoshimura,et al.  Increased expression of the LGALS3 (Galectin 3) gene in human non–small‐cell lung cancer , 2003, Genes, chromosomes & cancer.

[34]  D. Waisman,et al.  RNA Interference-mediated Silencing of the S100A10 Gene Attenuates Plasmin Generation and Invasiveness of Colo 222 Colorectal Cancer Cells* , 2004, Journal of Biological Chemistry.

[35]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[36]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[37]  J. Cobb,et al.  Global gene expression in neuroendocrine tumors from patients with the MEN1 syndrome , 2005, Molecular Cancer.

[38]  Javed Khan,et al.  Expression profiling identifies the cytoskeletal organizer ezrin and the developmental homeoprotein Six-1 as key metastatic regulators , 2004, Nature Medicine.

[39]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[40]  S. Knuutila,et al.  Molecular mechanisms of CD99-induced caspase-independent cell death and cell–cell adhesion in Ewing's sarcoma cells: actin and zyxin as key intracellular mediators , 2004, Oncogene.

[41]  P. Meltzer,et al.  Gene expression profiling of human sarcomas: insights into sarcoma biology. , 2005, Cancer research.

[42]  P. Kroon,et al.  Changes in neuronal protein 22 expression and cytoskeletal association in the alcohol‐dependent and withdrawn rat brain , 2005, Journal of neuroscience research.

[43]  D. Price,et al.  The motility of glioblastoma tumour cells is modulated by intracellular cofilin expression in a concentration-dependent manner. , 2005, Cell motility and the cytoskeleton.

[44]  David A. Williams,et al.  Rac GTPases differentially integrate signals regulating hematopoietic stem cell localization , 2005, Nature Medicine.

[45]  A. Kostyukova,et al.  Structural requirements of tropomodulin for tropomyosin binding and actin filament capping. , 2005, Biochemistry.

[46]  G. Gordon,et al.  Activation of focal adhesion kinase in human lung cancer cells involves multiple and potentially parallel signaling events , 2005, Journal of cellular and molecular medicine.

[47]  J. Aster,et al.  Notch signaling is a potent inducer of growth arrest and apoptosis in a wide range of B-cell malignancies. , 2005, Blood.

[48]  T. Triche,et al.  Identification of a PAX-FKHR gene expression signature that defines molecular classes and determines the prognosis of alveolar rhabdomyosarcomas. , 2006, Cancer research.

[49]  M. Breslin,et al.  INSM1 functions as a transcriptional repressor of the neuroD/beta2 gene through the recruitment of cyclin D1 and histone deacetylases. , 2006, The Biochemical journal.

[50]  K. Mikoshiba,et al.  Collapsin Response Mediator Protein 1 Mediates Reelin Signaling in Cortical Neuronal Migration , 2006, The Journal of Neuroscience.

[51]  Nikhil R. Pal,et al.  Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering , 2007, BMC Bioinformatics.

[52]  Louise C. Showe,et al.  Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data , 2007, BMC Bioinformatics.

[53]  T. Mizutani,et al.  SWI/SNF complex is essential for NRSF-mediated suppression of neuronal genes in human nonsmall cell lung carcinoma cell lines , 2006, Oncogene.

[54]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[55]  Satoru Kuhara,et al.  Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE , 2006, BMC Bioinformatics.

[56]  Lingyun Huang,et al.  Determination of metastasis‐associated proteins in non‐small cell lung cancer by comparative proteomic analysis , 2007, Cancer science.

[57]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[58]  Eric M. Blalock,et al.  Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data , 2007, BMC Bioinformatics.

[59]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[60]  M. Vawter,et al.  NCAM1 association study of bipolar disorder and schizophrenia: polymorphisms and alternatively spliced isoforms lead to similarities and differences , 2007, Psychiatric genetics.

[61]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[62]  Li-Wei Chang,et al.  Neurogenin and NeuroD direct transcriptional targets and their regulatory enhancers , 2007, The EMBO journal.

[63]  Chin-Teng Lin,et al.  Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems , 2008, BMC Bioinformatics.

[64]  N. Amariglio,et al.  The CXCR4 antagonist AMD3100 impairs survival of human AML cells and induces their differentiation , 2008, Leukemia.

[65]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[66]  E. Raetz,et al.  Molecular pathogenesis of T-cell leukaemia and lymphoma , 2008, Nature Reviews Immunology.

[67]  Jin-Kao Hao,et al.  Fuzzy Logic for Elimination of Redundant Information of Microarray Data , 2008, Genom. Proteom. Bioinform..

[68]  Korbinian Strimmer,et al.  Gene ranking and biomarker discovery under correlation , 2009, Bioinform..

[69]  L. Staudt,et al.  Identification of FGFR4-activating mutations in human rhabdomyosarcomas that promote metastasis in xenotransplanted models. , 2009, The Journal of clinical investigation.

[70]  M. Breslin,et al.  Structure, expression, and biological function of INSM1 transcription factor in neuroendocrine differentiation , 2009, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[71]  R. Fiancette,et al.  Genes encoding multiple forms of phospholipase A2 are expressed in immature forms of human leukemic blasts , 2009, Leukemia.

[72]  Louise C. Showe,et al.  Classification and biomarker identification using gene network modules and support vector machines , 2009, BMC Bioinformatics.

[73]  Nan Li,et al.  Human SCAMP5, a Novel Secretory Carrier Membrane Protein, Facilitates Calcium-Triggered Cytokine Secretion by Interaction with SNARE Machinery1 , 2009, The Journal of Immunology.

[74]  M. Müschen,et al.  Pre-B cell receptor signaling in acute lymphoblastic leukemia , 2009, Cell cycle.

[75]  J. McCubrey,et al.  Dual inhibition of class IA phosphatidylinositol 3-kinase and mammalian target of rapamycin as a new therapeutic option for T-cell acute lymphoblastic leukemia. , 2009, Cancer research.

[76]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[77]  E. Wiemer,et al.  The spliceosome as target for anticancer treatment , 2008, British Journal of Cancer.

[78]  Peter L Jones,et al.  Facioscapulohumeral muscular dystrophy region gene-1 (FRG-1) is an actin-bundling protein associated with muscle-attachment sites , 2010, Journal of Cell Science.

[79]  Annalisa Astolfi,et al.  CD99 inhibits neural differentiation of human Ewing sarcoma cells and thereby contributes to oncogenesis. , 2010, The Journal of clinical investigation.

[80]  黒田 耕志 Identification of ribosomal protein L19 as a novel tumor antigen recognized by autologous cytotoxic T lymphocytes in lung adenocarcinoma , 2010 .

[81]  P. Massion,et al.  DNA copy number aberrations in small-cell lung cancer reveal activation of the focal adhesion pathway , 2010, Oncogene.

[82]  C. Haslett,et al.  Integrin activation by Fam38A uses a novel mechanism of R-Ras targeting to the endoplasmic reticulum , 2010, Journal of Cell Science.

[83]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[84]  J. Schalken,et al.  Differential expression of PCA3 and its overlapping PRUNE2 transcript in prostate cancer , 2010, The Prostate.

[85]  G. Gerlitz,et al.  Efficient cell migration requires global chromatin condensation , 2010, Journal of Cell Science.

[86]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .