Manifold Learning Analysis for Allele-Skewed DNA Modification SNPs for Psychiatric Disorders

Bipolar disorder (BPD) and schizophrenia (SCZ) are two severe worldwide psychiatric disorders. Identifying genetic components contributing to both disorders will provide meaningful insights into their pathogenesis and widely-existed misdiagnosis. In this study, we employ allele-skewed DNA modification (ASM-SNP) data to investigate the two psychiatric disorders via state-of-the-art manifold learning, data-driven feature selection, and novel pathway analysis. We propose a novel manifold learning analysis for ASM-SNP data of bipolar disorder and schizophrenia based on a data-driven feature selection algorithm: nonnegative singular value approximation (NSVA). Our results indicate that t-distributed stochastic neighbor embedding (t-SNE) outperforms its peers in distinguishing psychiatric disorder samples from normal ones in both visualization and phenotype classification. It achieves the best phenotype diagnosis results with the average AUC 0.95 by using only about 20% top-ranked SNPs. Furthermore, our results from manifold learning along with support vector machine analysis suggest that the possible non-separability of SCZ and BPD in genetics. We also validate that SCZ and BPD both share the same or similar genetic variations from pathway analysis. This study indicates the inevitable misdiagnosis issue between BPD and SCZ from a machine learning and systems biology approach. The result sheds light on the existing psychiatry research to reexamine the existing behavior-based classification for BPD and SCZ. To the best of our knowledge, this study is the first comprehensive investigation of BPD and SCZ in bioinformatics.

[1]  Xiaoxu Han,et al.  Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Manuel A. R. Ferreira,et al.  Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder , 2008, Nature Genetics.

[3]  Peter Christen,et al.  A note on using the F-measure for evaluating record linkage algorithms , 2017, Statistics and Computing.

[4]  D. Wagner,et al.  Endothelial Von Willebrand Factor Promotes Blood–Brain Barrier Flexibility and Provides Protection From Hypoxia and Seizures in Mice , 2013, Arteriosclerosis, thrombosis, and vascular biology.

[5]  M. Ajmal,et al.  CD36 expression and brain function: does CD36 deficiency impact learning ability? , 2005, Prostaglandins & other lipid mediators.

[6]  Xing Chen,et al.  A novel relationship for schizophrenia, bipolar and major depressive disorder Part 5: a hint from chromosome 5 high density association screen. , 2017, American journal of translational research.

[7]  R. Tyndale,et al.  Pharmacogenetics: A Tool for Identifying Genetic Factors in Drug Dependence and Response to Treatment , 2010, Addiction science & clinical practice.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[10]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[11]  Xianchao Zhu,et al.  Visualization of disease relationships by multiple maps t-SNE regularization based on Nesterov accelerated gradient , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[13]  C. Spencer,et al.  Identification of loci associated with schizophrenia by genome-wide association and follow-up , 2008, Nature Genetics.

[14]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[15]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[16]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[17]  Sampath Jayarathna,et al.  EEG-based Processing and Classification Methodologies for Autism Spectrum Disorder: A Review , 2019, Journal of Computer Science.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[20]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[21]  D. Rosenberg,et al.  Glutamate system genes and brain volume alterations in pediatric obsessive-compulsive disorder: A preliminary study , 2013, Psychiatry Research: Neuroimaging.

[22]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[23]  Henry Han,et al.  A novel feature selection for RNA-seq analysis , 2017, bioRxiv.

[24]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[25]  J. Mill,et al.  Allele-specific methylation in the human genome , 2010, Epigenetics.

[26]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[28]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[29]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[30]  Daniel R Weinberger,et al.  Mapping DNA methylation across development, genotype, and schizophrenia in the human frontal cortex , 2015, Nature Neuroscience.

[31]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[32]  T. Werge,et al.  AMPK signaling linked to the schizophrenia-associated 1q21.1 deletion is required for neuronal and sleep maintenance , 2018, PLoS genetics.

[33]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[34]  K. Roche,et al.  Metabotropic glutamate receptors: Phosphorylation and receptor signaling , 2008, Journal of neuroscience research.

[35]  John P. Rice,et al.  Genomic Dissection of Bipolar Disorder and Schizophrenia, Including 28 Subphenotypes , 2017, Cell.

[36]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[37]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38]  Y. Pawitan,et al.  Human genetics and genomics a decade after the release of the draft sequence of the human genome , 2011, Human Genomics.

[39]  Arturas Petronis,et al.  Allele-Skewed DNA Modification in the Brain: Relevance to a Schizophrenia GWAS. , 2016, American journal of human genetics.

[40]  A. Wirz-Justice,et al.  Circadian Disruption and Psychiatric Disorders: The Importance of Entrainment , 2009 .

[41]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[42]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[43]  X. Xiao,et al.  Molecular mechanisms underlying noncoding risk variations in psychiatric genetic studies , 2017, Molecular Psychiatry.

[44]  L. Bryzgalov,et al.  Novel functional variants at the GWAS-implicated loci might confer risk to major depressive disorder, bipolar affective disorder and schizophrenia , 2018, BMC Neuroscience.

[45]  Libin Deng,et al.  Axon guidance pathway genes are associated with schizophrenia risk , 2018, Experimental and therapeutic medicine.

[46]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[47]  Sampath Jayarathna,et al.  A Survey of Attention Deficit Hyperactivity Disorder Identification Using Psychophysiological Data , 2019, Int. J. Online Biomed. Eng..

[48]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[49]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[50]  Xing Chen,et al.  A novel relationship for schizophrenia, bipolar and major depressive disorder Part 7: A hint from chromosome 7 high density association screen , 2015, Behavioural Brain Research.

[51]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[52]  J. Mill,et al.  Methylation quantitative trait loci in the developing brain and their enrichment in schizophrenia-associated genomic regions , 2015, Nature neuroscience.

[53]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[54]  Y. Fang,et al.  Analysis of Misdiagnosis of Bipolar Disorder in An Outpatient Setting , 2018, General Psychiatry.

[55]  M. Missler,et al.  Mutational analysis of the neurexin/neuroligin complex reveals essential and regulatory components , 2008, Proceedings of the National Academy of Sciences.

[56]  R. Bremananth,et al.  A Study on Human Hair Analysis and Synthesis , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[57]  Leonardo Franklin Fontenelle,et al.  Association of GRIN2B gene polymorphism and Obsessive Compulsive disorder and symptom dimensions: A pilot study , 2016, Psychiatry Research.

[58]  Tomás F. Pena,et al.  Graph-based approach for airborne light detection and ranging segmentation , 2014 .

[59]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[60]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Department of Electrical,et al.  Computational and Performance Aspects of PCA-Based Face-Recognition Algorithms , 2001, Perception.

[62]  Wentian Li,et al.  Application of t-SNE to Human Genetic Data , 2017, bioRxiv.

[63]  Adam G. Carter,et al.  GABAB receptor modulation of synaptic function , 2011, Current Opinion in Neurobiology.

[64]  A. Young,et al.  Prevalence and characteristics of undiagnosed bipolar disorders in patients with a major depressive episode: the BRIDGE study. , 2011, Archives of general psychiatry.

[65]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[66]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[67]  Thomas W. Mühleisen,et al.  Genome-wide association study of borderline personality disorder reveals genetic overlap with bipolar disorder, major depression and schizophrenia , 2017, Translational Psychiatry.

[68]  J. Friedman Stochastic gradient boosting , 2002 .

[69]  Henry Han,et al.  How does normalization impact RNA-seq disease diagnosis? , 2018, J. Biomed. Informatics.

[70]  Roy H Perlis,et al.  Misdiagnosis of bipolar disorder. , 2005, The American journal of managed care.

[71]  Henry Han,et al.  Diagnostic biases in translational bioinformatics , 2015, BMC Medical Genomics.

[72]  Stephen J. Chanock,et al.  Current status of genome-wide association studies in cancer , 2011, Human Genetics.

[73]  S. Djurovic,et al.  Genome-wide analysis reveals extensive genetic overlap between schizophrenia, bipolar disorder, and intelligence , 2019, Molecular Psychiatry.

[74]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .