c-GAMMA: Comparative Genome Analysis of Molecular Markers

Discovery of molecular markers for efficient identification of living organisms remains a challenge of high interest. The diversity of species can now be observed in details with low cost genomic sequences produced by new generation of sequencers. A method, called c-GAMMA , is proposed. It formalizes the design of new markers for such data. It is based on a series of filters on forbidden pairs of words, followed by an optimization step on the discriminative power of candidate markers. First results are presented on a set of microbial genomes. The importance of further developments are stressed to face the huge amounts of data that will soon become available in all kingdoms of life.

[1]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[2]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[3]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[4]  Diethard Tautz,et al.  An algorithm and program for finding sequence specific oligo-nucleotide probes for species identification , 2002, BMC Bioinformatics.

[5]  Thomas Kämpke,et al.  Efficient primer design algorithms , 2001, Bioinform..

[6]  Wing-Kin Sung,et al.  G-PRIMER: greedy algorithm for selecting minimal primer set , 2004, Bioinform..

[7]  Ali Bashir,et al.  Optimization of primer design for the detection of variable genomic lesions in cancer , 2007, Bioinform..

[8]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[9]  Yu-Tseung Liu,et al.  A Novel Approach for Determining Cancer Genomic Breakpoints in the Presence of Normal DNA , 2007, PloS one.

[10]  W. Whitman,et al.  Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. , 2002, International journal of systematic and evolutionary microbiology.

[11]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[12]  Alexander Schliep,et al.  Selecting signature oligonucleotides to identify organisms using DNA arrays , 2002, Bioinform..

[13]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[14]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Sophie Lemoine,et al.  An evaluation of custom microarray applications: the oligonucleotide design challenge , 2009, Nucleic acids research.

[16]  E. Koonin Darwinian evolution in the light of genomics , 2008, Nucleic acids research.