GR-Aligner: an algorithm for aligning pairwise genomic sequences containing rearrangement events

MOTIVATION Homologous genomic sequences between species usually contain different rearrangement events. Whether some specific patterns existed in the breakpoint regions that caused such events to occur is still unclear. To resolve this question, it is necessary to determine the location of breakpoints at the nucleotide level. The availability of sequences near breakpoints would further facilitate the related studies. We thus need a tool that can identify breakpoints and align the neighboring sequences. Although local alignment tools can detect rearrangement events, they only report a set of discontinuous alignments, where the detailed alignments in the breakpoint regions are usually missing. Global alignment tools are even less appropriate for these tasks since most of them are designed to align the conserved regions between sequences in a consistent order, i.e. they do not consider rearrangement events. RESULTS We propose an effective and efficient pairwise sequence alignment algorithm, called GR-Aligner (Genomic Rearrangement Aligner), which can find breakpoints of rearrangement events by integrating the forward and reverse alignments of the breakpoint regions flanked by homologously rearranged sequences. In addition, GR-Aligner also provides an option to view the alignments of sequences extended to the breakpoints. These outputs provide materials for studying possible evolutionary mechanisms and biological functionalities of the rearrangement.

[1]  Jimin Pei,et al.  PROMALS: towards accurate multiple sequence alignments of distantly related proteins , 2007, Bioinform..

[2]  H. Kamiguchi,et al.  Structural and functional analysis of the apoptosis-associated tyrosine kinase (AATYK) family , 2007, Neuroscience.

[3]  Laurent Duret,et al.  The Impact of Recombination on Nucleotide Substitutions in the Human Genome , 2008, PLoS genetics.

[4]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[5]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[6]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[7]  K. Pollard,et al.  Hotspots of Biased Nucleotide Substitutions in Human Genes , 2009, PLoS biology.

[8]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[9]  Tom H. Pringle,et al.  Molecular and Genomic Data Identify the Closest Living Relative of Primates , 2007, Science.

[10]  L. G. Davis,et al.  Basic methods in molecular biology , 1986 .

[11]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[12]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[13]  B. Haas,et al.  Comparative Genomics of Brassica oleracea and Arabidopsis thaliana Reveal Gene Loss, Fragmentation, and Dispersal after Polyploidy[W][OA] , 2006, The Plant Cell Online.

[14]  S. Mande,et al.  Multiple Gene Duplication and Rapid Evolution in the groEL Gene: Functional Implications , 2006, Journal of Molecular Evolution.

[15]  D. Cooper,et al.  Structural divergence between the human and chimpanzee genomes , 2007, Human Genetics.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[19]  Feng-Chi Chen,et al.  Human-specific insertions and deletions inferred from mammalian genome sequences. , 2006, Genome research.

[20]  Michael Brudno,et al.  Fast and sensitive alignment of large genomic sequences , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[21]  Colin N. Dewey,et al.  Aligning multiple whole genomes with Mercator and MAVID. , 2007, Methods in molecular biology.

[22]  Xiaolu Geng,et al.  Horizontal transfer of genetic determinants for degradation of phenol between the bacteria living in plant and its rhizosphere , 2007, Applied Microbiology and Biotechnology.

[23]  E. Eichler,et al.  Structural Dynamics of Eukaryotic Chromosome Evolution , 2003, Science.

[24]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[25]  Arthur Chun-Chieh Shih,et al.  GS-Aligner: a novel tool for aligning genomic sequences using bit-level operations. , 2003, Molecular biology and evolution.

[26]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[27]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[28]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[29]  P. Galle,et al.  Cloning and characterization of the promoter of Hugl-2, the human homologue of Drosophila lethal giant larvae (lgl) polarity gene. , 2008, Biochemical and biophysical research communications.