Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements

BackgroundGenomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision.ResultsWe define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence F, two fragments L, and R are produced that may exactly fit together to match F, leave a gap of deleted DNA between L and R, or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications.ConclusionsThe partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.

[1]  Masato Ishikawa,et al.  MASCOT: multiple alignment system for protein sequences based on three- way dynamic programming , 1993, Comput. Appl. Biosci..

[2]  M. Miya,et al.  Gene Rearrangements and Evolution of tRNA Pseudogenes in the Mitochondrial Genome of the Parrotfish (Teleostei: Perciformes: Scaridae) , 2004, Journal of Molecular Evolution.

[3]  Xi Chen,et al.  CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment , 2017, BMC Bioinformatics.

[4]  Marie-France Sagot,et al.  Precise detection of rearrangement breakpoints in mammalian chromosomes , 2008, BMC Bioinformatics.

[5]  P. Higgs,et al.  The Relationship Between the Rate of Molecular Evolution and the Rate of Genome Rearrangement in Animal Mitochondrial Genomes , 2006, Journal of Molecular Evolution.

[6]  Jang-Seu Ki,et al.  Unusual mitochondrial genome structure of the freshwater goby Odontobutis platycephala: rearrangement of tRNAs and an additional non‐coding region , 2008 .

[7]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[8]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[9]  J. Avise,et al.  Complete mitochondrial genome of a self-fertilizing fish Kryptolebias marmoratus (Cyprinodontiformes, Rivulidae) from Florida , 2017, Mitochondrial DNA. Part A, DNA mapping, sequencing, and analysis.

[10]  P. Stadler,et al.  Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements , 2011, Nucleic acids research.

[11]  M. Hasegawa,et al.  Molecular phylogeny and evolution of prosimians based on complete sequences of mitochondrial DNAs. , 2009, Gene.

[12]  D. Lunt,et al.  Animal mitochondrial DMA recombination , 1997, Nature.

[13]  P. Stadler,et al.  Genetic aspects of mitochondrial genome evolution. , 2013, Molecular phylogenetics and evolution.

[14]  Yi Pan,et al.  Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications: Scoring Functions, Algorithms and Applications , 2016 .

[15]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[16]  Jens Stoye,et al.  Finding All Common Intervals of k Permutations , 2001, CPM.

[17]  J. Inoue,et al.  Complete Mitochondrial DNA Sequence of Conger myriaster (Teleostei: Anguilliformes): Novel Gene Order for Vertebrate Mitochondrial Genomes and the Phylogenetic Implications for Anguilliform Families , 2001, Journal of Molecular Evolution.

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  Terry Gaasterland,et al.  The metabolic pathway collection from EMP: the enzymes and metabolic pathways database , 1996, Nucleic Acids Res..

[20]  M. Miya,et al.  Structure and variation of the mitochondrial genome of fishes , 2016, BMC Genomics.

[21]  C. Moraes,et al.  Intra- and inter-molecular recombination of mitochondrial DNA after in vivo induction of multiple double-strand breaks , 2009, Nucleic acids research.

[22]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[23]  M. Miya,et al.  Mitogenomic sequences and evidence from unique gene rearrangements corroborate evolutionary relationships of myctophiformes (Neoteleostei) , 2013, BMC Evolutionary Biology.

[24]  Jeffrey L. Boore,et al.  Gene translocation links insects and crustaceans , 1998, Nature.

[25]  P. Stadler,et al.  Towards a comprehensive picture of alloacceptor tRNA remolding in metazoan mitochondrial genomes , 2015, Nucleic acids research.

[26]  M. Miya,et al.  Phylogenetic position of tetraodontiform fishes within the higher teleosts: Bayesian inferences based on 44 whole mitochondrial genome sequences. , 2007, Molecular phylogenetics and evolution.

[27]  D. Michie “Memo” Functions and Machine Learning , 1968, Nature.

[28]  Eric Beitz,et al.  TeXshade: shading and labeling of multiple sequence alignments using LaTeX2e , 2000, Bioinform..

[29]  D. Kowbel,et al.  Gene arrangement in sea star mitochondrial DNA demonstrates a major inversion event during echinoderm evolution. , 1989, Gene.

[30]  Matthias Bernt,et al.  Combinatorics of Tandem Duplication Random Loss Mutations on Circular Genomes , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Matthias Bernt,et al.  CREx: inferring genomic rearrangements based on common intervals , 2007, Bioinform..

[32]  Satish Rao,et al.  On the tandem duplication-random loss model of genome rearrangement , 2006, SODA '06.

[33]  M. Bernt,et al.  Gene order rearrangement methods for the reconstruction of phylogeny , 2009 .

[34]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[35]  Guillaume Fertin,et al.  Combinatorics of Genome Rearrangements , 2009, Computational molecular biology.

[36]  J. Boore The duplication/random loss model for gene rearrangement exemplified by mitochondrial genomes of deu , 2000 .

[37]  Chengjin Zhang,et al.  Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes , 2016, BMC Bioinformatics.

[38]  J. Piškur,et al.  Diversity in organization and the origin of gene orders in the mitochondrial DNA molecules of the genus Saccharomyces. , 2000, Molecular biology and evolution.

[39]  C. Moraes,et al.  Intra-and intermolecular recombination of mitochondrial DNA after in vivo induction of multiple double-strand breaks , 2009 .

[40]  J. Boore,et al.  Molecular mechanisms of extensive mitochondrial gene rearrangement in plethodontid salamanders. , 2005, Molecular biology and evolution.

[41]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[42]  Matthias Bernt,et al.  Finding all sorting tandem duplication random loss operations , 2009, J. Discrete Algorithms.

[43]  W. Brown,et al.  Tandem duplication of D-loop and ribosomal RNA sequences in lizard mitochondrial DNA. , 1986, Science.

[44]  Guillaume Fertin,et al.  Genome rearrangements with indels in intergenes restrict the scenario space , 2016, BMC Bioinformatics.

[45]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[46]  M. Fonseca,et al.  Mitochondrial Gene Rearrangements and Partial Genome Duplications Detected by Multigene Asymmetric Compositional Bias Analysis , 2006, Journal of Molecular Evolution.

[47]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[48]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences , 2015, J. Comput. Biol..

[49]  Peter J. Stuckey,et al.  Progressive Multiple Alignment Using Sequence Triplet Optimizations and Three-residue Exchange Costs , 2004, J. Bioinform. Comput. Biol..

[50]  Peter F Stadler,et al.  Progressive multiple sequence alignments from triplets , 2007, BMC Bioinformatics.

[51]  Timothy M. Collins,et al.  Deducing the pattern of arthropod phytogeny from mitochondrial DNA rearrangements , 1995, Nature.

[52]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[53]  M. Miya,et al.  Mitogenomic Evolution and Interrelationships of the Cypriniformes (Actinopterygii: Ostariophysi): The First Evidence Toward Resolution of Higher-Level Relationships of the World’s Largest Freshwater Fish Clade Based on 59 Whole Mitogenome Sequences , 2006, Journal of Molecular Evolution.

[54]  M. Dowton,et al.  Intramitochondrial recombination - is it why some mitochondrial genes sleep around? , 2001, Trends in ecology & evolution.

[55]  Qinghua Hu,et al.  HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy , 2015, Bioinform..