Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases
暂无分享,去创建一个
A. Bateman | M. Andrade-Navarro | M. Grynberg | A. Gruca | K. Jakobsen | D. Linke | V. Promponas | M. Anisimova | A. Kajava | O. K. Tørresen | B. Star | Pablo Mier | Patryk Jarnot | Miguel Andrade | Bastiaan Star
[1] E. Blackburn,et al. A tandemly repeated sequence at the termini of the extrachromosomal ribosomal RNA genes in Tetrahymena. , 1978, Journal of molecular biology.
[2] R. Ferone,et al. Dihydrofolate reductase: thymidylate synthase, a bifunctional polypeptide from Crithidia fasciculata. , 1980, Proceedings of the National Academy of Sciences of the United States of America.
[3] P. Ferrara,et al. Nucleotide sequence of the metL gene of Escherichia coli. Its product, the bifunctional aspartokinase ii-homoserine dehydrogenase II, and the bifunctional product of the thrA gene, aspartokinase I-homoserine dehydrogenase I, derive from a common ancestor. , 1983, The Journal of biological chemistry.
[4] Swee Lay Thein,et al. Hypervariable ‘minisatellite’ regions in human DNA , 1985, Nature.
[5] M. Litt,et al. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. , 1989, American journal of human genetics.
[6] R I Richards,et al. Simple tandem DNA repeats and human genetic disease. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[7] A. Devries,et al. Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. , 1997, Proceedings of the National Academy of Sciences of the United States of America.
[8] A. Devries,et al. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. , 1997, Proceedings of the National Academy of Sciences of the United States of America.
[9] J Heringa,et al. Detection of internal repeats: how common are they? , 1998, Current opinion in structural biology.
[10] D. Eisenberg,et al. A census of protein repeats. , 1999, Journal of molecular biology.
[11] G. Lindahl,et al. The R28 protein of Streptococcus pyogenes is related to several group B streptococcal surface proteins, confers protective immunity and promotes binding to human epithelial cells , 1999, Molecular microbiology.
[12] D. Eisenberg,et al. A combined algorithm for genome-wide prediction of protein function , 1999, Nature.
[13] G. Benson,et al. Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.
[14] Anton J. Enright,et al. Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.
[15] Eugene W. Myers,et al. A whole-genome assembly of Drosophila. , 2000, Science.
[16] G Vergnaud,et al. Minisatellites: mutability and genome architecture. , 2000, Genome research.
[17] C. Ponting,et al. Homology-based method for identification of protein repeats using statistical significance estimates. , 2000, Journal of molecular biology.
[18] J. Jurka. Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.
[19] John M. Butler,et al. STRBase: a short tandem repeat DNA database for the human identity testing community , 2001, Nucleic Acids Res..
[20] Tom H. Pringle,et al. The human genome browser at UCSC. , 2002, Genome research.
[21] P. Tompa. Intrinsically unstructured proteins evolve by repeat expansion , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.
[22] Livia Visai,et al. Characterization of novel LPXTG-containing proteins of Staphylococcus aureus identified from genome sequences. , 2003, Microbiology.
[23] Matthew Hurles,et al. Gene Duplication: The Genomic Trade in Spare Parts , 2004, PLoS biology.
[24] Aleksandar Milosavljevic,et al. Prototypic sequences for human repetitive DNA , 1992, Journal of Molecular Evolution.
[25] M. G. Kidwell,et al. Transposable elements and the evolution of genome size in eukaryotes , 2002, Genetica.
[26] Fran Lewitter,et al. Intragenic tandem repeats generate functional variability , 2005, Nature Genetics.
[27] H. Riethman,et al. Human subtelomere structure and variation , 2005, Chromosome Research.
[28] M. Borodovsky,et al. Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.
[29] Y. Kashi,et al. Simple sequence repeats as advantageous mutators in evolution. , 2006, Trends in genetics : TIG.
[30] Gary Benson,et al. TRDB—The Tandem Repeats Database , 2006, Nucleic Acids Res..
[31] Casey M. Bergman,et al. Discovering and detecting transposable elements in genome sequences , 2007, Briefings Bioinform..
[32] M. Cáccamo,et al. Conservation and divergence of gene families encoding components of innate immune response systems in zebrafish , 2007, Genome Biology.
[33] Jonathan E. Allen,et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.
[34] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.
[35] Nancy F. Hansen,et al. Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.
[36] M. Anisimova,et al. Origin and Evolution of GALA-LRR, a New Member of the CC-LRR Subfamily: From Plants to Bacteria? , 2008, PloS one.
[37] David Haussler,et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..
[38] Christoph Mayer,et al. Genome-wide analysis of tandem repeats in Daphnia pulex - a comparative approach , 2010, BMC Genomics.
[39] D. S. Reiner,et al. Draft Genome Sequencing of Giardia intestinalis Assemblage B Isolate GS: Is Human Giardiasis Caused by Two Different Species? , 2009, PLoS pathogens.
[40] John M. Hancock,et al. Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins , 2009, Genome Biology.
[41] S. Turner,et al. Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.
[42] A. Futschik,et al. The Next Generation of Molecular Markers From Massively Parallel Sequencing of Pooled DNA Samples , 2010, Genetics.
[43] Inge Jonassen,et al. Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim , 2010, Bioinform..
[44] A. Gnirke,et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.
[45] Loris Mularoni,et al. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. , 2010, Genome research.
[46] Mark Akeson,et al. Replication of Individual DNA Molecules under Electronic Control Using a Protein Nanopore , 2010, Nature nanotechnology.
[47] Seth Debolt,et al. Copy Number Variation Shapes Genome Diversity in Arabidopsis Over Immediate Family Generational Scales , 2010, Genome biology and evolution.
[48] Andrey V Kajava,et al. Protein homorepeats sequences, structures, evolution, and functions. , 2010, Advances in Protein Chemistry and Structural Biology.
[49] Bin Xue,et al. Protein tandem repeats – the more perfect, the less structured , 2010, The FEBS journal.
[50] Meenakshi Agarwal,et al. Centromere identity: a challenge to be faced , 2010, Molecular Genetics and Genomics.
[51] S. Koren,et al. Assembly algorithms for next-generation sequencing data. , 2010, Genomics.
[52] E. Szarka,et al. Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors , 2011, Genes.
[53] Xiaomin Zhao,et al. ALS51, a newly discovered gene in the Candida albicans ALS family, created by intergenic recombination: analysis of the gene and protein, and implications for evolution of microbial gene families. , 2011, FEMS immunology and medical microbiology.
[54] Mark Yandell,et al. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.
[55] N. Friedman,et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.
[56] Inge Jonassen,et al. The genome sequence of Atlantic cod reveals a unique immune system , 2011, Nature.
[57] T. Glenn. Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.
[58] Xuan Zhuang,et al. Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome , 2012, BMC Genomics.
[59] S. Salzberg,et al. Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.
[60] M. Yandell,et al. A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.
[61] Sergey I. Nikolenko,et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..
[62] Alain Hauser,et al. Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences , 2012, Nucleic acids research.
[63] N. Kyrpides,et al. Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.
[64] M. Albà,et al. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins , 2012, BMC Evolutionary Biology.
[65] J. Whitney,et al. Re-Evaluation of a Bacterial Antifreeze Protein as an Adhesin with Ice-Binding Activity , 2012, PloS one.
[66] Andrey V Kajava,et al. Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.
[67] M. Kasahara,et al. VLR-based adaptive immunity. , 2012, Annual review of immunology.
[68] R. Hardison. Evolution of hemoglobin and its genes. , 2012, Cold Spring Harbor perspectives in medicine.
[69] Maria Anisimova,et al. Graph-based modeling of tandem repeats improves global multiple sequence alignment , 2013, Nucleic acids research.
[70] Sabyasachi Das,et al. Organization of lamprey variable lymphocyte receptor C locus and repertoire development , 2013, Proceedings of the National Academy of Sciences.
[71] Philip Hugenholtz,et al. Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data , 2013, PLoS Comput. Biol..
[72] C. Liang,et al. Genome-Wide Analysis of Tandem Repeats in Plants and Green Algae , 2013, G3: Genes, Genomes, Genetics.
[73] A. Grove,et al. C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding , 2012, Bioscience reports.
[74] F. Hoffmann,et al. Whole-Genome Duplication and the Functional Diversification of Teleost Fish Hemoglobins , 2012, Molecular biology and evolution.
[75] Carolyn J. Lawrence-Dill,et al. MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.
[76] Mengmeng Huang,et al. PCR amplification of repetitive DNA: a limitation to genome editing technologies and many other applications , 2014, Scientific Reports.
[77] O. Gascuel,et al. Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.
[78] Qi Li,et al. Genome-Wide Analysis of Simple Sequence Repeats in Marine Animals—a Comparative Approach , 2014, Marine Biotechnology.
[79] A. Aertsen,et al. The role of variable DNA tandem repeats in bacterial adaptation. , 2014, FEMS microbiology reviews.
[80] Michail Yu. Lobanov,et al. HRaP: database of occurrence of HomoRepeats and patterns in proteomes , 2013, Nucleic Acids Res..
[81] Matthew Fraser,et al. InterProScan 5: genome-scale protein function classification , 2014, Bioinform..
[82] Kin-Fan Au,et al. PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..
[83] Floriane Plard,et al. Comparative Analysis of Transposable Elements Highlights Mobilome Diversity and Evolution in Vertebrates , 2015, Genome biology and evolution.
[84] Katharina J. Hoff,et al. Current methods for automated annotation of protein-coding genes. , 2015, Current opinion in insect science.
[85] Marco Pellegrini,et al. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role , 2015, Front. Bioeng. Biotechnol..
[86] Gabor T. Marth,et al. A global reference for human genetic variation , 2015, Nature.
[87] Tyler A. Elliott,et al. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.
[88] María Martín,et al. UniProt: A hub for protein information , 2015 .
[89] The Uniprot Consortium,et al. UniProt: a hub for protein information , 2014, Nucleic Acids Res..
[90] Maria Anisimova,et al. Statistical Approaches to Detecting and Analyzing Tandem Repeats in Genomic Sequences , 2015, Front. Bioeng. Biotechnol..
[91] Christos A. Ouzounis,et al. Annotation inconsistencies beyond sequence similarity-based function prediction – phylogeny and genome structure , 2015, Standards in Genomic Sciences.
[92] M. Anisimova,et al. The evolution and function of protein tandem repeats in plants. , 2015, The New phytologist.
[93] Ioannis Xenarios,et al. TRAL: tandem repeat annotation library , 2015, Bioinform..
[94] James R. Knight,et al. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod , 2016, bioRxiv.
[95] P. Trosvik,et al. Microsatellite Length Scoring by Single Molecule Real Time Sequencing – Effects of Sequence Structure and PCR Regime , 2016, PloS one.
[96] I. Inoue,et al. Structure and evolution of the filaggrin gene repeated region in primates , 2017, BMC Evolutionary Biology.
[97] Daniel J. Gaffney,et al. A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.
[98] I. Bradbury,et al. Preferential amplification of repetitive DNA during whole genome sequencing library creation from historic samples , 2016 .
[99] M. Gonzalez-Garay. Introduction to Isoform Sequencing Using Pacific Biosciences Technology (Iso-Seq) , 2016 .
[100] Drew R. Schield,et al. Microsatellite landscape evolutionary dynamics across 450 million years of vertebrate genome evolution. , 2016, Genome.
[101] Philipp H. Schiffer,et al. Structure and evolutionary history of a large family of NLR proteins in the zebrafish , 2015, bioRxiv.
[102] Jeffrey T Leek,et al. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.
[103] H. Pamjav,et al. A study of the Bodrogköz population in north-eastern Hungary by Y chromosomal haplotypes and haplogroups , 2017, Molecular Genetics and Genomics.
[104] Srikrishna Subramanian,et al. Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii , 2017, Scientific Reports.
[105] D. Ray,et al. Evolution and Diversity of Transposable Elements in Vertebrate Genomes , 2017, Genome biology and evolution.
[106] Nicholas W. VanKuren,et al. Hidden genetic variation shapes the structure of functional elements in Drosophila , 2017, Nature Genetics.
[107] Melissa Gymrek,et al. A genomic view of short tandem repeats. , 2017, Current opinion in genetics & development.
[108] Mark Yandell,et al. The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution , 2018, Nature Genetics.
[109] W. Yang,et al. Sequence-based diversity of 23 autosomal STR loci in Koreans investigated using an in-house massively parallel sequencing panel. , 2017, Forensic science international. Genetics.
[110] A. Ossowski,et al. Genetic variation of 15 autosomal STRs in a population sample of Bedouins residing in the area of the Fourth Nile Cataract, Sudan. , 2017, Anthropologischer Anzeiger; Bericht uber die biologisch-anthropologische Literatur.
[111] K. Jakobsen,et al. Evolution of Hemoglobin Genes in Codfishes Influenced by Ocean Depth , 2017, Scientific Reports.
[112] K. Jakobsen,et al. De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data , 2017, Molecular biology and evolution.
[113] I. Voets,et al. Structure of a 1.5-MDa adhesin that binds its Antarctic bacterium to diatoms and ice , 2017, Science Advances.
[114] A. Pang,et al. Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications , 2017, Genome research.
[115] Paolo Piazza,et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis , 2017, F1000Research.
[116] Matheus Eloy Franco,et al. In silico characterization of tandem repeats in Trichophyton rubrum and related dermatophytes provides new insights into their role in pathogenesis , 2017, Database J. Biol. Databases Curation.
[117] Silvio C. E. Tosatto,et al. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures , 2017, Nucleic Acids Res..
[118] Miguel A. Andrade-Navarro,et al. dAPE: a web server to detect homorepeats and follow their evolution , 2016, Bioinform..
[119] J. Akey,et al. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana , 2017, bioRxiv.
[120] Ayelet T. Lamm,et al. QsRNA-seq: a method for high-throughput profiling and quantifying small RNAs , 2018, Genome Biology.
[121] B. Larue,et al. Nuclear, chloroplast, and mitochondrial data of a US cannabis DNA database , 2018, International Journal of Legal Medicine.
[122] D. Linke,et al. The repeat structure of two paralogous genes, Yersinia ruckeri invasin (yrInv) and a "Y. ruckeri invasin-like molecule", (yrIlm) sheds light on the evolution of adhesive capacities of a fish pathogen. , 2017, Journal of structural biology.
[123] F. Denoeud,et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps , 2018, Nature Plants.
[124] A. Nederbragt,et al. Genomic architecture of haddock (Melanogrammus aeglefinus) shows expansions of innate immune genes and short tandem repeats , 2018, BMC Genomics.
[125] Atif Adnan,et al. Population data and phylogenetic structure of Han population from Jiangsu province of China on GlobalFiler STR loci , 2018, International Journal of Legal Medicine.
[126] Ralph Schlapbach,et al. Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats , 2018, bioRxiv.
[127] N. Morling,et al. The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit , 2018, International Journal of Legal Medicine.
[128] Pablo Mier,et al. Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length , 2018, Genome biology and evolution.
[129] Juan Carlos Castilla-Rubio,et al. Earth BioGenome Project: Sequencing life for the future of life , 2018, Proceedings of the National Academy of Sciences.
[130] Alexandre Souvorov,et al. SKESA: strategic k-mer extension for scrupulous assemblies , 2018, Genome Biology.
[131] J. Bennetzen,et al. Comparative genome-wide characterization leading to simple sequence repeat marker development for Nicotiana , 2018, BMC Genomics.
[132] Dennis A. Benson,et al. GenBank , 2017, Nucleic Acids Res..
[133] M Thomas P Gilbert,et al. Bat Biology, Genomes, and the Bat1K Project: To Generate Chromosome-Level Genomes for All Living Bat Species. , 2018, Annual review of animal biosciences.
[134] R. Kretsinger,et al. Leucine Rich Repeat Proteins: Sequences, Mutations, Structures and Diseases. , 2019, Protein and peptide letters.
[135] Sergey Koren,et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.
[136] Mick Watson,et al. Errors in long-read assemblies can critically affect protein prediction , 2019, Nature Biotechnology.
[137] Sergey Koren,et al. Reply to ‘Errors in long-read assemblies can critically affect protein prediction’ , 2019, Nature Biotechnology.
[138] Silvio C. E. Tosatto,et al. Disentangling the complexity of low complexity proteins , 2019, Briefings Bioinform..