rearrvisr: an R package to detect, classify, and visualize genome rearrangements

The identification of genome rearrangements is of direct relevance for understanding their potential impacts on evolution and disease. However, available methods that detect or visualize rearrangements from deviations in gene order do not map them onto a genome of interest, complicating downstream analysis. In this work, we present rearrvisr, an R package that implements a novel algorithm for the identification and classification of rearrangements. In contrast to other software, it projects rearrangements onto a single genome, facilitating the localization of rearranged regions and estimation of their extent. We show that our tool achieves high precision and recall scores on simulated data, and illustrate the utility of our method by applying it to a data set generated from publicly available Drosophila genomes. The package is freely available from GitHub (https://github.com/dorolin/rearrvisr) and can be installed directly from R.

[1]  Pavel A. Pevzner,et al.  Transforming men into mice (polynomial algorithm for genomic distance problem) , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[2]  Glenn Tesler,et al.  GRIMM: genome rearrangements web server , 2002, Bioinform..

[3]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[4]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[5]  Martin C Frith,et al.  A survey of localized sequence rearrangements in human DNA , 2017, Nucleic acids research.

[6]  A.L. Madsen,et al.  Variations over the message computation algorithm of lazy propagation , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Michael Ashburner,et al.  Principles of Genome Evolution in the Drosophila melanogaster Species Group , 2007, PLoS biology.

[8]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[9]  Cédric Chauve,et al.  ANGES: reconstructing ANcestral GEnomeS maps , 2012, Bioinform..

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Haibao Tang,et al.  Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups Papaya, Poplar, and Grape: CoGe with Rosids1[W] , 2008, Plant Physiology.

[12]  Jens Stoye,et al.  UniMoG—a unifying framework for genomic distance calculation and sorting based on DCJ , 2012, Bioinform..

[13]  Arjun Bhutkar,et al.  Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes , 2008, Genetics.

[14]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[15]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[16]  Yu Lin,et al.  MLGO: phylogeny reconstruction and ancestral inference from gene-order data , 2014, BMC Bioinformatics.

[17]  A. Magi,et al.  Detection of Genomic Structural Variants from Next-Generation Sequencing Data , 2015, Front. Bioeng. Biotechnol..

[18]  L H. Rieseberg,et al.  Chromosomal rearrangements and speciation. , 2001, Trends in ecology & evolution.

[19]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[20]  Matthias Bernt,et al.  Genome Rearrangement Analysis: Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. , 2018, Methods in molecular biology.

[21]  P. Pevzner Transforming Men into Mice , 1996 .

[22]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[23]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[24]  M. Ashburner,et al.  Relationships within the melanogaster species subgroup of the genus Drosophila (Sophophora) - II. Phylogenetic relationships between six species based upon polytene chromosome banding sequences , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[25]  Damon Lisch,et al.  How important are transposons for plant evolution? , 2012, Nature Reviews Genetics.

[26]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[27]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[28]  Carol Soderlund,et al.  SyMAP v3.4: a turnkey synteny system with application to plant genomes , 2011, Nucleic acids research.

[29]  Kaizhong Zhang,et al.  Algorithmic approaches for genome rearrangement: a review , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Shuai Jiang,et al.  Reconstruction of ancestral genomes in presence of gene gain and loss , 2016, bioRxiv.