Intrinsic-overlapping co-expression module detection with application to Alzheimer's Disease

Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological functions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other research. We further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotype data for experimentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and a few more, play significant roles in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we find experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/publications/resources/CluViaN Software.rar.

[1]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[2]  Aaron M. Newman,et al.  AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number , 2010, BMC Bioinformatics.

[3]  Yang Yang,et al.  Differences of immune disorders between Alzheimer’s disease and breast cancer based on transcriptional regulation , 2017, PloS one.

[4]  P. Wong,et al.  Amyloid precursor protein processing and Alzheimer's disease. , 2011, Annual review of neuroscience.

[5]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[6]  Jenny Wong,et al.  Altered Expression of RNA Splicing Proteins in Alzheimer's Disease Patients: Evidence from Two Microarray Studies , 2013, Dementia and Geriatric Cognitive Disorders Extra.

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  E. Choi,et al.  Pathological roles of MAPK signaling pathways in human diseases. , 2010, Biochimica et biophysica acta.

[9]  Andrew Simmons,et al.  Alterations in brain leptin signalling in spite of unchanged CSF leptin levels in Alzheimer’s disease , 2014, Aging cell.

[10]  M K Markey,et al.  Application of the mutual information criterion for feature selection in computer-aided diagnosis. , 2001, Medical physics.

[11]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[12]  Wei Wang,et al.  An exploratory study on STX6, MOBP, MAPT, and EIF2AK3 and late-onset Alzheimer's disease , 2013, Neurobiology of Aging.

[13]  Jugal K. Kalita,et al.  An effective method for network module extraction from microarray data , 2012, BMC Bioinformatics.

[14]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[15]  A. Roses,et al.  The genetic contributions of SNCA and LRRK2 genes to Lewy Body pathology in Alzheimer's disease. , 2014, Human molecular genetics.

[16]  Dhruba Kumar Bhattacharyya,et al.  FUMET: A fuzzy network module extraction technique for gene expression data , 2014, Journal of Biosciences.

[17]  Taizo Hanai,et al.  Analysis of expression profile using fuzzy adaptive resonance theory , 2002, Bioinform..

[18]  Mauricio Cabrera-Rios,et al.  A Selection of Important Genes and Their Correlated Behavior in Alzheimer’s Disease , 2018, Journal of Alzheimer's disease : JAD.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[22]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[23]  Asoke K. Nandi,et al.  Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments , 2013, Journal of The Royal Society Interface.

[24]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[25]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[26]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[27]  K Kosaka,et al.  Brain site‐specific gene expression analysis in Alzheimer's disease patients , 2006, European journal of clinical investigation.

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[30]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[31]  Guillaume Cleuziou,et al.  A Generalization of k-Means for Overlapping Clustering , 2007 .

[32]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[33]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Bess Frost,et al.  Alzheimer's disease: An acquired neurodegenerative laminopathy , 2016, Nucleus.

[35]  C. van Broeckhoven,et al.  Presenilin mutations in Alzheimer's disease , 1998, Human mutation.

[36]  M. Doran,et al.  Clinical phenotypic heterogeneity of Alzheimer's disease associated with mutations of the presenilin–1 gene , 2006, Journal of Neurology.

[37]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[38]  Jianer Chen,et al.  A Fast Agglomerate Algorithm for Mining Functional Modules in Protein Interaction Networks , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[39]  Mark T. W. Ebbert,et al.  Genetics of Alzheimer's Disease , 2013, BioMed research international.

[40]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[41]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[42]  Wei Li,et al.  SNCA Gene Polymorphism may Contribute to an Increased Risk of Alzheimer's Disease , 2016, Journal of clinical laboratory analysis.

[43]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[44]  Steven Finkbeiner,et al.  PML in the Brain: From Development to Degeneration , 2013, Front. Oncol..

[45]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[46]  Hamid Bolouri,et al.  Modeling genomic regulatory networks with big data. , 2014, Trends in genetics : TIG.

[47]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[48]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[49]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[50]  Hui Zheng,et al.  Biology and pathophysiology of the amyloid precursor protein , 2011, Molecular Neurodegeneration.

[51]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[52]  Robert C. Green,et al.  Variants in PPP3R1 and MAPT are associated with more rapid functional decline in Alzheimer's disease: The Cache County Dementia Progression Study , 2014, Alzheimer's & Dementia.

[53]  Carlos D. Barranco,et al.  Incorporating biological knowledge for construction of fuzzy networks of gene associations , 2016, Appl. Soft Comput..

[54]  Magda Tsolaki,et al.  Mitochondrial genes are altered in blood early in Alzheimer's disease , 2017, Neurobiology of Aging.

[55]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[56]  M. Oleksiak,et al.  Convergence and divergence in gene expression among natural populations exposed to pollution , 2007, BMC Genomics.

[57]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[58]  Li-Ying Cui,et al.  Screening of VCP mutations in Chinese amyotrophic lateral sclerosis patients , 2013, Neurobiology of Aging.

[59]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[60]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[61]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[62]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[63]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[64]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[65]  Guanming Wu,et al.  A network module-based method for identifying cancer prognostic signatures , 2012, Genome Biology.

[66]  Young Chul Youn,et al.  The genetics of Alzheimer’s disease , 2014, Clinical interventions in aging.

[67]  Andrea Califano,et al.  ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information , 2016, Bioinform..

[68]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[69]  Neil Salkind Encyclopedia of Measurement and Statistics , 2006 .

[70]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[71]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[72]  Jugal K. Kalita,et al.  Reconstruction of gene co-expression network from microarray data using local expression patterns , 2014, BMC Bioinformatics.

[73]  Swarup Roy,et al.  An Approach to Find Embedded Clusters Using Density Based Techniques , 2005, ICDCIT.

[74]  Guoyan Zhao,et al.  Identification of muscle-specific regulatory modules in Caenorhabditis elegans. , 2007, Genome research.

[75]  Chung-Yen Lin,et al.  cytoHubba: identifying hub objects and sub-networks from complex interactome , 2014, BMC Systems Biology.

[76]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[77]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[78]  Lindsay A. Farrer,et al.  Genome-wide association study of Alzheimer's disease endophenotypes at prediagnosis stages , 2018, Alzheimer's & Dementia.

[79]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[80]  Tetsuro Sakai,et al.  New MT-ND6 and NDUFA1 mutations in mitochondrial respiratory chain disorders , 2014, Annals of clinical and translational neurology.

[81]  G. Church,et al.  Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae , 2001, Nature Genetics.

[82]  Taylor J. Maxwell,et al.  A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease. , 2006, American journal of human genetics.

[84]  Weixiong Zhang,et al.  Identification and Evaluation of Functional Modules in Gene Co-expression Networks , 2006, Systems Biology and Computational Proteomics.

[85]  Isidro Ferrer,et al.  Neuron-specific alterations in signal transduction pathways associated with Alzheimer's disease. , 2014, Journal of Alzheimer's disease : JAD.