ManiNetCluster: A Manifold Learning Approach to Reveal the Functional Linkages Across Multiple Gene Networks

The coordination of genome encoded function is a critical and complex process in biological systems, especially across phenotypes or states (e.g., time, disease, organism). Understanding how the complexity of genome-encoded function relates to these states remains a challenge. To address this, we have developed a novel computational method based on manifold learning and comparative analysis, ManiNetCluster, which simultaneously aligns and clusters multiple molecular networks to systematically reveal function links across multiple datasets. Specifically, ManiNetCluster employs manifold learning to match local and non-linear structures among the networks of different states, to identify cross-network linkages. By applying ManiNetCluster to the developmental gene expression datasets across model organisms (e.g., worm, fruit fly), we found that our tool significantly better aligns the orthologous genes than existing state-of-the-art methods, indicating the non-linear interactions between evolutionary functions in development. Moreover, we applied ManiNetCluster to a series of transcriptomes measured in the green alga Chlamy-domonas reinhardtii, to determine the function links between various metabolic processes between the light and dark periods of a diurnally cycling culture. For example, we identify a number of genes putatively regulating processes across each lighting regime, and how comparative analyses between ManiNetCluster and other clustering tools can provide additional insights. ManiNetCluster is available as an R package together with a tutorial at https://github.com/namtk/ManiNetCluster.

[1]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[2]  Leng Han,et al.  Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types , 2014, Nature Communications.

[3]  Vittorio Murino,et al.  A Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning , 2014, J. Mach. Learn. Res..

[4]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[5]  Ming Yang,et al.  Multi-View Representation Learning: A Survey from Shallow Methods to Deep Methods , 2016, ArXiv.

[6]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[7]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[8]  A. Grossman,et al.  The GreenCut2 Resource, a Phylogenomically Derived Inventory of Proteins Specific to the Plant Lineage* , 2011, The Journal of Biological Chemistry.

[9]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[10]  Winkler,et al.  Evolution of , 2017 .

[11]  Sara L. Zimmer,et al.  The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions , 2007, Science.

[12]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[13]  J. Mattick,et al.  A global view of genomic information--moving beyond the gene and the master regulator. , 2010, Trends in genetics : TIG.

[14]  Young-Rae Cho,et al.  PrimAlign: PageRank-inspired Markovian alignment for large biological networks , 2018, Bioinform..

[15]  Tijana Milenkovic,et al.  MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation , 2015, Bioinform..

[16]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[17]  Ian K. Blaby,et al.  High-Resolution Profiling of a Synchronized Diurnal Transcriptome from Chlamydomonas reinhardtii Reveals Continuous Cell and Metabolic Differentiation[OPEN] , 2015, Plant Cell.

[18]  Jun Zhu,et al.  Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets , 2010, PLoS Comput. Biol..

[19]  Sancar Adali Joint Optimization of Fidelity and Commensurability for Manifold Alignment and Graph Matching , 2014 .

[20]  S. Shen-Orr,et al.  Alignment of single-cell trajectories to compare cellular expression dynamics , 2018, Nature Methods.

[21]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[22]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[23]  Tijana Milenkovic,et al.  MAGNA: Maximizing Accuracy in Global Network Alignment , 2013, Bioinform..

[24]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[25]  José G García-Cerdán,et al.  Assembly of the Light-Harvesting Chlorophyll Antenna in the Green Alga Chlamydomonas reinhardtii Requires Expression of the TLA2-CpFTSY Gene1[C][W][OA] , 2011, Plant Physiology.

[26]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[27]  Sridhar Mahadevan,et al.  Manifold Warping: Manifold Alignment over Time , 2012, AAAI.

[28]  Ricardo J. G. B. Campello,et al.  On the selection of appropriate distances for gene expression data clustering , 2014, BMC Bioinformatics.

[29]  Jian Pei,et al.  OrthoCluster: a new tool for mining synteny blocks and applications in comparative genomics , 2008, EDBT '08.

[30]  Mark Gerstein,et al.  OrthoClust: an orthology-based network framework for clustering data across multiple species , 2014, Genome Biology.

[31]  Chang Wang,et al.  A General Framework for Manifold Alignment , 2009, AAAI Fall Symposium: Manifold Learning and Its Applications.

[32]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.

[33]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[34]  G. Pazour,et al.  Proteomic analysis of a eukaryotic cilium , 2005, The Journal of cell biology.

[35]  Mark Gerstein,et al.  Cross-Disciplinary Network Comparison: Matchmaking Between Hairballs. , 2016, Cell systems.

[36]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[37]  Fabian J. Theis,et al.  Statistical single cell multi-omics integration , 2018 .

[38]  Duc Dong Do,et al.  ACOGNA2: A novel algorithm for maximizing accuracy in global network alignment , 2019, 2019 6th NAFOSTED Conference on Information and Computer Science (NICS).

[39]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[40]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[41]  Alexander J. Hartemink,et al.  MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics , 2017, Genome Biology.

[42]  The GreenCut: re-evaluation of physiological role of previously studied proteins and potential novel protein functions , 2013, Photosynthesis Research.

[43]  Mariano J. Alvarez,et al.  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers , 2010, Molecular systems biology.

[44]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[45]  S. Mitter,et al.  Testing the Manifold Hypothesis , 2013, 1310.0425.

[46]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[47]  Shuli Kang,et al.  Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network , 2011, Nucleic acids research.

[48]  Vikas Sindhwani,et al.  Vector-valued Manifold Regularization , 2011, ICML.

[49]  Ajay Rana,et al.  K-means with Three different Distance Metrics , 2013 .

[50]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[51]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[52]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[53]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..