Biomarker identification by knowledge-driven multilevel ICA and motif analysis

Traditional statistical methods often fail to identify biologically meaningful biomarkers from expression data alone. In this paper, we develop a novel strategy, namely knowledge-driven multi-level Independent Component Analysis (ICA), to infer regulatory signals and identify biomarkers based on clustering results and partial prior knowledge. A statistical test is designed to evaluate significance of transcription factor enrichment for extracted gene set based on motif information. The experimental results on an Rsf-1 (HBXAP) induced microarray data set show that our method can successfully extract biologically meaningful biomarkers related to ovarian cancer compared to other gene selection methods with or without prior knowledge.

[1]  Zhiping Weng,et al.  PromoSer: a large-scale mammalian promoter and transcription start site identification service , 2003, Nucleic Acids Res..

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ana Conesa,et al.  maSigPro: a Method to Identify Significantly Differential Expression Profiles in Time-Course Microarray Experiments , 2006, Spanish Bioinformatics Conference.

[4]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[5]  David P. Kreil,et al.  Independent component analysis of microarray data in the study of endometrial cancer , 2004, Oncogene.

[6]  S. Schneider-Maunoury,et al.  Multiple pituitary and ovarian defects in Krox-24 (NGFI-A, Egr-1)-targeted mice. , 1998, Molecular endocrinology.

[7]  Masato Inoue,et al.  BLIND GENE CLASSIFICATION BASED ON ICA OF MICROARRAY DATA , 2001 .

[8]  Karin Milde-Langosch,et al.  The Fos family of transcription factors and their role in tumourigenesis. , 2005, European journal of cancer.

[9]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[10]  Sheng‐Chung Lee,et al.  Functional interaction between nuclear matrix-associated HBXAP and NF-κB , 2004 .

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  J. Devore,et al.  Statistics: The Exploration and Analysis of Data , 1986 .

[13]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[14]  Ying Wang,et al.  IL-8 Reduced Tumorigenicity of Human Ovarian Cancer In Vivo Due to Neutrophil Infiltration1 , 2000, The Journal of Immunology.

[15]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  Byoung-Tak Zhang,et al.  Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation , 2006, Bioinform..

[18]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[19]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[20]  Giovanni Parmigiani,et al.  Amplification of a chromatin remodeling gene, Rsf-1/HBXAP, in ovarian carcinoma. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Robert Clarke,et al.  Motif-directed network component analysis for regulatory network inference , 2008, BMC Bioinformatics.

[22]  J. Richards,et al.  Regulation of AP1 (Jun/Fos) Factor Expression and Activation in Ovarian Granulosa Cells , 2000, The Journal of Biological Chemistry.

[23]  Lei Xu Ovarian cancer angiogenesis, biology and therapy , 2000 .

[24]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ryung S. Kim,et al.  An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse , 2006, BMC Bioinformatics.

[26]  Chen Wang,et al.  Stability-Based Dimension Estimation of ICA with Application to Microarray Data Analysis , 2007, BIOCOMP.

[27]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis , 1997, Neural Computation.

[28]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[29]  Ralph S Freedman,et al.  Ovarian cancer, the coagulation pathway, and inflammation , 2005, Journal of Translational Medicine.

[30]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.