Label Propagation Based Semi-supervised Feature Selection to Decode Clinical Phenotype of Huntington's Disease

Huntington’s disease is a type of neurodegenerative disease caused by gene HTT. To date, its molecular pathogenesis is still unclear. Clinically, behavior, cognitive, and mental function are affected progressively. With the rapid development of sequencing technologies, it is possible to explore the molecular mechanisms at the genome-wide transcriptomic level using computational methods. Our previous studies have shown that it is difficult to distinguish disease genes from non-disease genes. To understand the molecular pathogenesis under complex clinical phenotypes during the disease progression, it is better to identify biomarkers corresponding to different disease stage. Therefore, in this study, we designed a label propagation based semi-supervised feature selection approach (LPFS) to identify disease-associated genes corresponding to different clinical phenotypes. LPFS selects disease-associated genes corresponding to different disease stage through the alternative iteration of label propagation clustering and feature selection. We then conducted an enrichment analysis to understand gene functions and affected pathways during the disease progression, thus to decode the changes in individual behavioral and mental characteristics during neurodegenerative disease progression at the gene expression level. Our results have shown that LPFS performs better in comparison with the-state-of-art methods. We found that TGF-beta signaling pathway, olfactory transduction, cytokine-cytokine receptor interaction, immune response, and inflammatory response were gradually affected during the disease progression. In addition, we found that the expression of Ccdc33, Capsl, Al662270, and Dlgap5 were seriously changed caused by the development of the disease.

[1]  Feng Duan,et al.  Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data , 2017, BMC Bioinformatics.

[2]  Shinsuke Fujioka,et al.  Analysis of COQ2 gene in multiple system atrophy , 2014, Molecular Neurodegeneration.

[3]  S H Appel,et al.  Immune-mediated cell death in neurodegenerative disease. , 1996, Advances in neurology.

[4]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[5]  Guojun Bu,et al.  Dysregulation of protein trafficking in neurodegeneration , 2014, Molecular Neurodegeneration.

[6]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[7]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[8]  Eric H Kim,et al.  New Perspectives on the Neuropathology in Huntington's Disease in the Human Brain and its Relation to Symptom Variation. , 2012, Journal of Huntington's disease.

[9]  J. Hardy,et al.  Pathways to primary neurodegenerative disease. , 1999, Neurologia.

[10]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[11]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[12]  Zhao Zhang,et al.  Flexible Non-Negative Matrix Factorization to Unravel Disease-Related Genes , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Katharine Gammon,et al.  Neurodegenerative disease: Brain windfall , 2014, Nature.

[14]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[16]  Ruth Luthi-Carter,et al.  What have we learned from gene expression profiles in Huntington's disease? , 2012, Neurobiology of Disease.

[17]  S. W. Davies,et al.  Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. , 1997, Science.

[18]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[19]  Jane S. Paulsen,et al.  Huntington disease: natural history, biomarkers and prospects for therapeutics , 2014, Nature Reviews Neurology.

[20]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[21]  Giovanni Coppola,et al.  Integrated genomics and proteomics to define huntingtin CAG length-dependent networks in HD Mice , 2016, Nature Neuroscience.

[22]  Ming Shao,et al.  Consensus Guided Unsupervised Feature Selection , 2016, AAAI.