Qualitative assessment of functional module detectors on microarray and RNASeq data

A set of correlated and co-expressed genes, often referred as a functional module, play a synergistic role during any disease or any biological activities. Genes participating in a common module may cause clinically similar diseases and share a common genetic origin of their associated disease phenotypes. Identifying such modules may be helpful in system-level understanding of biological and cellular processes or pathophysiologic basis of associated diseases. As a result detecting such functional modules is an active research issue in the area of computational biology. Some techniques have been proposed so far to find functional modules based on gene co-regulation or co-expression data. These methods are broadly categorized into non-network based gene expression clustering techniques and network-based methods that extract modules from gene co-expression networks using expression data sources. We survey main approaches for obtaining modules, and we evaluate their performance regarding finding biologically significant gene modules in light of both microarray and RNASeq data. No prior effort, other than independent assessment, has been made so far to evaluate their performances in an integrated way in the light of both microarray and RNASeq data. We assess the significance of the modules in terms of gene ontology and pathway analysis. We select a few of the best performers to access their capability in finding disease-specific modules. Our comparison reveals that no single algorithm is a winner in all respects. Moreover, performances vary widely with microarray and RNASeq data. Relatively, biclustering performs better, when we consider microarray expression data, but fails to perform well in case of RNASeq data. Network-based techniques work better in RNASeq.

[1]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[2]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[3]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[4]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[5]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[6]  H. Okano,et al.  Evidence that phosphorylated ubiquitin signaling is involved in the etiology of Parkinson’s disease , 2016, Human molecular genetics.

[7]  A. Björklund,et al.  Transcriptome analysis reveals transmembrane targets on transplantable midbrain dopamine progenitors , 2015, Proceedings of the National Academy of Sciences.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  B. Becattini,et al.  JNK at the crossroad of obesity, insulin resistance, and cell stress response , 2016, Molecular metabolism.

[10]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Swarup Roy,et al.  Computational Methods for Detecting Functional Modules from Gene Regulatory Network , 2016, ICTCS.

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[14]  Yu Cheng,et al.  Identification and validation of gene module associated with lung cancer through coexpression network analysis. , 2015, Gene.

[15]  Joana P Gonçalves,et al.  BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data , 2009, BMC Research Notes.

[16]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[17]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[18]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[19]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[20]  Xingyu Wang,et al.  Temporally Constrained Sparse Group Spatial Patterns for Motor Imagery BCI , 2019, IEEE Transactions on Cybernetics.

[21]  Pietro Hiram Guzzi,et al.  Microarray Data Analysis , 2016, Methods in Molecular Biology.

[22]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[23]  Í. Lopes-Cendes,et al.  IKKε Is Key to Induction of Insulin Resistance in the Hypothalamus, and Its Inhibition Reverses Obesity , 2014, Diabetes.

[24]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[25]  Jian Pei,et al.  DHC: a density-based hierarchical clustering method for time series gene expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[26]  Swarup Roy,et al.  Analysis of Gene Expression Patterns Using Biclustering. , 2015, Methods in molecular biology.

[27]  Sagar H. Barage,et al.  Amyloid cascade hypothesis: Pathogenesis and therapeutic strategies in Alzheimer's disease , 2015, Neuropeptides.

[28]  Jugal K. Kalita,et al.  Reconstruction of gene co-expression network from microarray data using local expression patterns , 2014, BMC Bioinformatics.

[29]  Pietro Hiram Guzzi,et al.  Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin , 2017, Briefings Bioinform..

[30]  Gary D. Bader,et al.  GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop , 2010, Bioinform..

[31]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[32]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[33]  Pietro Hiram Guzzi Microarray Data Analysis: Methods and Applications , 2016 .

[34]  Carlo Zaniolo,et al.  A Discussion on the Biological Relevance of Clustering Results , 2014, ITBAM.

[35]  E. Davidson,et al.  Gene regulatory networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Frederick Immermann,et al.  An Introduction to Cluster Analysis , 2003 .

[37]  Guanming Wu,et al.  A network module-based method for identifying cancer prognostic signatures , 2012, Genome Biology.

[38]  Aaron M. Newman,et al.  AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number , 2010, BMC Bioinformatics.

[39]  Jugal K. Kalita,et al.  An effective method for network module extraction from microarray data , 2012, BMC Bioinformatics.

[40]  Jugal K. Kalita,et al.  CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data , 2013, Pattern Recognit. Lett..

[41]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[42]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[43]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[44]  Fang-Xiang Wu,et al.  Genetic weighted k-means algorithm for clustering large-scale gene expression data , 2008, BMC Bioinformatics.

[45]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[46]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[47]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[48]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[49]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[50]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[51]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[52]  Anbupalam Thalamuthu,et al.  Gene expression Evaluation and comparison of gene clustering methods in microarray analysis , 2006 .

[53]  Takaomi C. Saido,et al.  Familial Alzheimer’s Disease Mutations in Presenilin Generate Amyloidogenic Aβ Peptide Seeds , 2016, Neuron.

[54]  Andrzej Cichocki,et al.  Linked Component Analysis From Matrices to High-Order Tensors: Applications to Biomedical Data , 2015, Proceedings of the IEEE.

[55]  Pietro Hiram Guzzi,et al.  M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations , 2013, Proteome Science.

[56]  S. Dongen Graph clustering by flow simulation , 2000 .

[57]  G. d’Annunzio,et al.  Variations of the Perforin Gene in Patients With Type 1 Diabetes , 2008, Diabetes.

[58]  Rui Henriques,et al.  BicPAMS: software for biological data analysis with pattern-based biclustering , 2017, BMC Bioinformatics.

[59]  P. Wong,et al.  Amyloid precursor protein processing and Alzheimer's disease. , 2011, Annual review of neuroscience.

[60]  M. Tampakeras,et al.  Dopamine D2 receptor gene variants and response to rasagiline in early Parkinson's disease: a pharmacogenetic study. , 2016, Brain : a journal of neurology.

[61]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[62]  Weixiong Zhang,et al.  Identification and Evaluation of Functional Modules in Gene Co-expression Networks , 2006, Systems Biology and Computational Proteomics.

[63]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[64]  Dhruba Kumar Bhattacharyya,et al.  FUMET: A fuzzy network module extraction technique for gene expression data , 2014, Journal of Biosciences.

[65]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[66]  Nitin S. Baliga,et al.  cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism , 2015, Nucleic acids research.

[67]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[68]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[69]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[70]  Zhe Liu,et al.  A new clustering method of gene expression data based on multivariate Gaussian mixture models , 2015, Signal, Image and Video Processing.

[71]  Marcel H. Schulz,et al.  Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments , 2010, Nucleic acids research.

[72]  Stephen Lee,et al.  WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures , 2010, BMC Bioinformatics.

[73]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[74]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[75]  Waseem Ahmad,et al.  Phoenix: Privacy Preserving Biclustering on Horizontally Partitioned Data , 2007, PinKDD.

[76]  Yaakov Stern,et al.  The APOE-∊4 Allele and the Risk of Alzheimer Disease Among African Americans, Whites, and Hispanics , 1998 .

[77]  Alex Zelikovsky,et al.  Mean Square Residue Biclustering with Missing Data and Row Inversions , 2009, ISBRA.