GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data

BackgroundHi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species.ResultsWe present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse.ConclusionsBy identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

[1]  J. Dekker,et al.  Hi-C: a comprehensive technique to capture the conformation of genomes. , 2012, Methods.

[2]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..

[3]  Mathieu Raffinot,et al.  An algorithmic view of gene teams , 2004, Theor. Comput. Sci..

[4]  Hon Wai Leong,et al.  Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths , 2009, J. Comput. Biol..

[5]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[6]  Bing Ren,et al.  Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing , 2013, Nature Biotechnology.

[7]  Sven Rahmann,et al.  Genome analysis , 2022 .

[8]  J. Monod,et al.  [Operon: a group of genes with the expression coordinated by an operator]. , 1960, Comptes rendus hebdomadaires des seances de l'Academie des sciences.

[9]  Ron Shamir,et al.  Functional gene groups are concentrated within chromosomes, among chromosomes and in the nuclear space of the human genome , 2014, Nucleic acids research.

[10]  Jens Stoye,et al.  Finding approximate gene clusters with Gecko 3 , 2016, Nucleic acids research.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Jens Stoye,et al.  Gecko and GhostFam , 2007 .

[13]  Biing-Feng Wang,et al.  A New Efficient Algorithm for the Gene-Team Problem on General Sequences , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Jens Stoye,et al.  Character sets of strings , 2007, J. Discrete Algorithms.

[15]  Biing-Feng Wang,et al.  Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  Biing-Feng Wang,et al.  Constructing a Gene Team Tree in Almost O(n lg n) Time. , 2013, IEEE/ACM transactions on computational biology and bioinformatics.

[18]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[19]  S. Dalton,et al.  Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. , 2010, Genome research.

[20]  Jens Stoye,et al.  Gecko and GhostFam: rigorous and efficient gene cluster detection in prokaryotic genomes. , 2007, Methods in molecular biology.

[21]  Jesús S. Aguilar-Ruiz,et al.  GO-based Functional Dissimilarity of Gene Sets , 2011, BMC Bioinformatics.

[22]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[23]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[24]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[25]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[26]  B. Degnan,et al.  The NK Homeobox Gene Cluster Predates the Origin of Hox Genes , 2007, Current Biology.

[27]  Katharina Jahn Efficient Computation of Approximate Gene Clusters Based on Reference Occurrences , 2011, J. Comput. Biol..