LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

BackgroundPhysical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).ResultsTo address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.ConclusionsThe results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.

[1]  Peisen Zhang,et al.  An algorithm based on graph theory for the assembly of contigs in physical mapping of DNA , 1994, Comput. Appl. Biosci..

[2]  H. Shizuya,et al.  Contig assembly of bacterial artificial chromosome clones through multiplexed fluorescence-labeled fingerprinting. , 1999, Genomics.

[3]  Steven J. M. Jones,et al.  A BAC clone fingerprinting approach to the detection of human genome rearrangements , 2007, Genome Biology.

[4]  Jacqueline E. Schein,et al.  Automated ordering of fingerprinted clones , 2004, Bioinform..

[5]  Michael C. Wendl,et al.  Argonaute—a database for gene regulation by mammalian microRNAs , 2005, BMC Bioinformatics.

[6]  G. Muehlbauer,et al.  Genetics and Genomics of the Triticeae , 2009 .

[7]  S. Wessler,et al.  High Potential of a Transposon mPing as a Marker System in japonica × japonica Cross in Rice , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  Jerrold R. Griggs,et al.  Interval graphs and maps of DNA. , 1986, Bulletin of mathematical biology.

[9]  B. Williams,et al.  An Integrated Physical and Genetic Map of the Rice Genome , 2002, The Plant Cell Online.

[10]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[11]  S. Bhandarkar,et al.  Parallel computation of a maximum-likelihood estimator of a physical map. , 2001, Genetics.

[12]  R. Wing,et al.  Efficacy of clone fingerprinting methodologies. , 2007, Genomics.

[13]  A. Coulson,et al.  Toward a physical map of the genome of the nematode Caenorhabditis elegans. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[15]  E. Lander,et al.  Genomic mapping by anchoring random clones: a mathematical analysis. , 1991, Genomics.

[16]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[17]  Michael C. Wendl,et al.  Probabilistic Assessment of Clone Overlaps in DNA Fingerprint Mapping via a Priori Models , 2005, J. Comput. Biol..

[18]  J Griffith,et al.  In vitro reconstruction of the Aspergillus (= Emericella) nidulans genome. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Srinivas Aluru,et al.  Handbook Of Computational Molecular Biology , 2010 .

[20]  Jan Vrána,et al.  Dissecting large and complex genomes: flow sorting and BAC cloning of individual chromosomes from bread wheat. , 2004, The Plant journal : for cell and molecular biology.

[21]  B. Gill,et al.  Genomic targeting and high-resolution mapping of the domestication gene Q in wheat. , 2002, Genome.

[22]  E Nevo,et al.  Constructing large-scale genetic maps using an evolutionary strategy algorithm. , 2003, Genetics.

[23]  B. Gill,et al.  Map-based cloning of leaf rust resistance gene Lr21 from the large and polyploid genome of bread wheat. , 2003, Genetics.

[24]  M. Morgante,et al.  Mapping and sequencing complex genomes: let's get physical! , 2004, Nature Reviews Genetics.

[25]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[26]  Marco Marra,et al.  A map for sequence analysis of the Arabidopsis thaliana genome , 1999, Nature Genetics.

[27]  A. Cuticchia,et al.  The use of simulated annealing in chromosome reconstruction experiments based on binary scoring. , 1992, Genetics.

[28]  R. Wilson,et al.  High throughput fingerprint analysis of large-insert clones. , 1997, Genome research.

[29]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[30]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[31]  Parvaneh Saeedi,et al.  A physical map of the mouse genome , 2002, Nature.

[32]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[33]  J. Womack,et al.  A gene-based high-resolution comparative radiation hybrid map as a framework for genome sequence assembly of a bovine chromosome 6 region associated with QTL for growth, body composition, and milk performance traits , 2006, BMC Genomics.

[34]  Stephen J O'Brien,et al.  Every genome sequence needs a good map. , 2009, Genome research.

[35]  Guy Mayraz,et al.  Construction of physical maps from oligonucleotide fingerprints data , 1999, RECOMB.

[36]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[37]  G. Martin,et al.  Chromosome landing: a paradigm for map-based gene cloning in plants with large genomes. , 1995, Trends in genetics : TIG.

[38]  Jonathan Arnold,et al.  ODS: ordering DNA sequences - a physical mapping algorithm based on simulated annealing , 1993, Comput. Appl. Biosci..

[39]  Z. Frenkel,et al.  Methods for Genetic Analysis in the Triticeae , 2009 .

[40]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[41]  D. Baulcombe,et al.  Physical Association of the NB-LRR Resistance Protein Rx with a Ran GTPase–Activating Protein Is Required for Extreme Resistance to Potato virus X[W][OA] , 2007, The Plant Cell Online.

[42]  BMC Bioinformatics , 2005 .

[43]  Abraham B. Korol,et al.  Multilocus consensus genetic maps (MCGM): Formulation, algorithms, and results , 2006, Comput. Biol. Chem..

[44]  R. Wing,et al.  Physical mapping of the rice genome with BACs , 1997, Plant Molecular Biology.

[45]  Béla Bollobás,et al.  Modern Graph Theory , 2002, Graduate Texts in Mathematics.

[46]  C. Soderlund,et al.  Access to the maize genome: an integrated physical and genetic map. , 2002, Plant physiology.

[47]  Lee Aaron Newberg,et al.  Physical mapping of chromosomes: A combinatorial problem in molecular biology , 1995, SODA '93.

[48]  J. Roach,et al.  Pairwise end sequencing: a unified approach to genomic mapping and sequencing. , 1995, Genomics.

[49]  D. Bentley,et al.  Genome mapping by fluorescent fingerprinting. , 1997, Genome research.

[50]  Michael Jünger,et al.  Graph Drawing Software , 2003, Graph Drawing Software.

[51]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[52]  J. Bennetzen,et al.  DNA Rearrangement in Orthologous Orp Regions of the Maize, Rice and Sorghum Genomes , 2005, Genetics.

[53]  D. Haussler,et al.  A physical map of the human genome , 2001, Nature.

[54]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[55]  C. Scheuring,et al.  Genome physical mapping with large-insert bacterial clones by fingerprint analysis: methodologies, source clone genome coverage, and contig map quality. , 2004, Genomics.

[56]  G. Wang,et al.  Construction of a rice bacterial artificial chromosome library and identification of clones linked to the Xa-21 disease resistance locus. , 1995, The Plant journal : for cell and molecular biology.

[57]  R. Wing,et al.  Diploid/Polyploid Syntenic Shuttle Mapping and Haplotype-Specific Chromosome Walking Toward a Rust Resistance Gene (Bru1) in Highly Polyploid Sugarcane (2n ∼ 12x ∼ 115) , 2008, Genetics.

[58]  D. Balding,et al.  Statistical analysis of DNA fingerprint data for ordered clone physical mapping of human chromosomes. , 1991, Bulletin of mathematical biology.

[59]  J Griffith,et al.  A fast random cost algorithm for physical mapping. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Galina Fuks,et al.  Whole-Genome Validation of High-Information-Content Fingerprinting1 , 2005, Plant Physiology.

[61]  R. Staden A strategy of DNA sequencing employing computer programs. , 1979, Nucleic acids research.

[62]  James W. Fickett,et al.  A GENETIC ALGORITHM FOR ASSEMBLING CHROMOSOME PHYSICAL MAPS , 1993 .

[63]  Vladimir Batagelj,et al.  Pajek - Analysis and Visualization of Large Networks , 2001, Graph Drawing Software.

[64]  G. Mahairas,et al.  Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Rodger Staden,et al.  Software for genome mapping by fingerprinting techniques , 1988, Comput. Appl. Biosci..

[66]  Suchendra M. Bhandarkar,et al.  Information Theoretic Approach to Genome Reconstruction , 2005 .

[67]  C. Soderlund,et al.  Contigs built with fingerprints, markers, and FPC V4.7. , 2000, Genome research.

[68]  Carolyn Thomas,et al.  High-throughput fingerprinting of bacterial artificial chromosomes using the snapshot labeling kit and sizing of restriction fragments by capillary electrophoresis. , 2003, Genomics.

[69]  Carol Soderlund,et al.  FPC: a system for building contigs from restriction fingerprinted clones , 1997, Comput. Appl. Biosci..

[70]  Pierre Sourdille,et al.  A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B , 2008, Science.