Disease-related gene module detection based on a multi-label propagation clustering algorithm

Detecting disease-related gene modules by analyzing gene expression data is of great significance. It is helpful for exploratory analysis of the interaction mechanisms of genes under complex disease phenotypes. The multi-label propagation algorithm (MLPA) has been widely used in module detection for its fast and easy implementation. The accuracy of MLPA greatly depends on the connections between nodes, and most existing research focuses on measuring the similarity between nodes. However, MLPA does not perform well with loose connections between disease-related genes. Moreover, the biological significance of modules obtained by MLPA has not been demonstrated. To solve these problems, we designed a double label propagation clustering algorithm (DLPCA) based on MLPA to study Huntington’s disease. In DLPCA, in addition to category labels, we introduced pathogenic labels to supervise the process of multi-label propagation clustering. The pathogenic labels contain pathogenic information about disease genes and the hierarchical structure of gene expression data. Experimental results demonstrated the superior performance of DLPCA compared with other conventional gene-clustering algorithms.

[1]  S. Knox From 'omics' to complex disease: a systems biology approach to gene-environment interactions in cancer , 2010, Cancer Cell International.

[2]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[3]  Hong Cheng,et al.  Sparsity induced similarity measure for label propagation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Yang Lingpeng,et al.  Information Retrieval Using Label Propagation Based Ranking , 2007 .

[5]  E. Schadt Molecular networks as sensors and drivers of common human diseases , 2009, Nature.

[6]  Ulrich Bodenhofer,et al.  APCluster: an R package for affinity propagation clustering , 2011, Bioinform..

[7]  Guojun Bu,et al.  Dysregulation of protein trafficking in neurodegeneration , 2014, Molecular Neurodegeneration.

[8]  TaeHyun Hwang,et al.  A Heterogeneous Label Propagation Algorithm for Disease Gene Discovery , 2010, SDM.

[9]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[10]  Y. Marie,et al.  ASPM-associated stem cell proliferation is involved in malignant progression of gliomas and constitutes an attractive therapeutic target , 2010, Cancer Cell International.

[11]  Z. Yue,et al.  Neuronal aggregates: formation, clearance, and spreading. , 2015, Developmental cell.

[12]  Wei Yang,et al.  A novel method for predicting activity of cis-regulatory modules, based on a diverse training set , 2017, Bioinform..

[13]  Xiangxiang Zeng,et al.  Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[15]  Jun Dong,et al.  Understanding network concepts in modules , 2007, BMC Systems Biology.

[16]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[17]  Aiqing He,et al.  Systems genetics analysis of gene-by-environment interactions in human cells. , 2010, American journal of human genetics.

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  M D Osbakken,et al.  Nuclear magnetic resonance imaging characterization of a rat mammary tumor , 1986, Magnetic resonance in medicine.

[20]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[21]  Novel Computational Technologies for Next-Generation Sequencing Data Analysis and Their Applications , 2015, International journal of genomics.

[22]  Pietro Liò,et al.  Towards real-time community detection in large networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Dong-Hong Ji,et al.  Information Retrieval Using Label Propagation Based Ranking , 2007, NTCIR.

[25]  Xingyi Zhang,et al.  Overlapping Community Detection based on Network Decomposition , 2016, Scientific Reports.

[26]  Mohamed Maher,et al.  Image annotation and retrieval based on multi-modal feature clustering and similarity propagation. , 2011 .

[27]  Xiangxiang Zeng,et al.  Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks , 2016, Briefings Bioinform..

[28]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[30]  Rui Kuang,et al.  Global Linear Neighborhoods for Efficient Label Propagation , 2012, SDM.

[31]  Shinsuke Fujioka,et al.  Analysis of COQ2 gene in multiple system atrophy , 2014, Molecular Neurodegeneration.

[32]  Albert-László Barabási,et al.  Scale-Free Networks: A Decade and Beyond , 2009, Science.

[33]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[34]  Lei Qi,et al.  Active Semi-supervised Affinity Propagation Clustering Algorithm Based on Local Outlier Factor , 2018, 2018 37th Chinese Control Conference (CCC).

[35]  Michael Lees,et al.  BacGrid: simulations of bacteria using the grid , 2007, BMC Systems Biology.

[36]  Claudia Angelini,et al.  Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells , 2008, BMC Bioinformatics.

[37]  Marko Bajec,et al.  Robust network community detection using balanced propagation , 2011, ArXiv.

[38]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[39]  Hyung-Jun Kim,et al.  Prion-like Mechanism in Amyotrophic Lateral Sclerosis: are Protein Aggregates the Key? , 2014, Experimental neurobiology.

[40]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[41]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[42]  Giovanni Coppola,et al.  Integrated genomics and proteomics to define huntingtin CAG length-dependent networks in HD Mice , 2016, Nature Neuroscience.

[43]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[44]  Xian-Sheng Hua,et al.  Video Annotation Based on Kernel Linear Neighborhood Propagation , 2008, IEEE Transactions on Multimedia.

[45]  Tomas Olovsson,et al.  A local seed selection algorithm for overlapping community detection , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[46]  Benno Schwikowski,et al.  Network-based analysis of omics data: the LEAN method , 2016, Bioinform..

[47]  M. Barber,et al.  Detecting network communities by propagating labels under constraints. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.