CoCoNUT: an efficient system for the comparison and analysis of genomes

BackgroundComparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons.ResultsMost of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (C omputational C omparative geN omics U tility T oolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences.ConclusionCoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.

[1]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[2]  D. G. Brown,et al.  The origins of genomic duplications in Arabidopsis. , 2000, Science.

[3]  Gordon Gremme,et al.  Engineering a software tool for gene structure prediction in higher organisms , 2005, Inf. Softw. Technol..

[4]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[5]  Owen White,et al.  The Comprehensive Microbial Resource , 2001, Nucleic Acids Res..

[6]  Enno Ohlebusch,et al.  Efficient mapping of large cDNA/EST databases to genomes: A comparison of two different strategies , 2005, German Conference on Bioinformatics.

[7]  Enno Ohlebusch,et al.  CHAINER: Software for Comparing Genomes , 2004 .

[8]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[9]  Aaron E. Darling,et al.  Identifying Evolutionarily Conserved Segments Among Multiple Divergent and Rearranged Genomes , 2004, Comparative Genomics.

[10]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[11]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[12]  Enno Ohlebusch,et al.  A Local Chaining Algorithm and Its Applications in Comparative Genomics , 2003, WABI.

[13]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .

[14]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[15]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[16]  Alistair G. Rust,et al.  Ensembl 2002: accommodating comparative genomics , 2003, Nucleic Acids Res..

[17]  Erin Beck,et al.  The comprehensive microbial resource , 2000, Nucleic Acids Res..

[18]  Enno Ohlebusch,et al.  Chaining algorithms for multiple genome comparison , 2005, J. Discrete Algorithms.

[19]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[20]  Stefano Lonardi,et al.  Computational Biology , 2004, Handbook of Data Structures and Applications.

[21]  Enno Ohlebusch,et al.  Space Efficient Computation of Rare Maximal Exact Matches between Multiple Sequences , 2008, J. Comput. Biol..

[22]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[23]  W. Miller,et al.  Mulan: multiple-sequence local alignment and visualization for studying function and evolution. , 2005, Genome research.

[24]  S. Kurtz The Vmatch large scale sequence analysis software , 2003 .

[25]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[26]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[27]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[28]  Enno Ohlebusch,et al.  An Applications-focused Review of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges , 2003, Briefings Bioinform..

[29]  Enno Ohlebusch,et al.  Chaining Algorithms and Applications in Comparative Genomics , 2004 .

[30]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[31]  Xavier Messeguer,et al.  M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species , 2006, BMC Bioinformatics.

[32]  Jill P. Mesirov,et al.  Computational Biology , 2018, Encyclopedia of Parallel Computing.

[33]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[34]  R. Farber,et al.  Incorrect use of the term synteny , 1999, Nature Genetics.

[35]  Tin Wee Tan,et al.  MGAlign, a Reduced Search Space Approach to the Alignment of mRNA Sequences to Genomic Sequences , 2003 .

[36]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[37]  R. Durbin,et al.  A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. , 1995, Gene.

[38]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[39]  Steven L. Salzberg,et al.  Finding Repeats in Genome Sequences , 2008 .

[40]  P. Pevzner,et al.  Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. , 2004, Genome research.

[41]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[42]  Mohamed Ibrahim Abouelhoda,et al.  A Chaining Algorithm for Mapping cDNA Sequences to Multiple Genomic Sequences , 2007, SPIRE.

[43]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[44]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[45]  Thomas Rattei,et al.  Gepard: a rapid and sensitive tool for creating dotplots on genome scale , 2007, Bioinform..

[46]  Rodger Staden,et al.  Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes , 1984, Nucleic Acids Res..

[47]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[48]  Tetsuo Shibuya,et al.  Match Chaining Algorithms for cDNA Mapping , 2003, WABI.

[49]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.