Statistical Approaches to Detecting and Analyzing Tandem Repeats in Genomic Sequences

Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.

[1]  Vladimir N Uversky,et al.  Protein tandem repeats - the more perfect, the less structured. , 2010, The FEBS journal.

[2]  A. Hannan Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for 'missing heritability'. , 2010, Trends in genetics : TIG.

[3]  Maria Anisimova,et al.  Graph-based modeling of tandem repeats improves global multiple sequence alignment , 2013, Nucleic acids research.

[4]  Allam Appa Rao,et al.  Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters , 2010 .

[5]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[6]  B. Dujon,et al.  Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes , 2008, Microbiology and Molecular Biology Reviews.

[7]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[8]  Alain Hauser,et al.  Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences , 2012, Nucleic acids research.

[9]  Jaap Heringa,et al.  Global multiple‐sequence alignment with repeats , 2006, Proteins.

[10]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[11]  Angelika Merkel,et al.  Detecting Microsatellites in Genome Data: Variance in Definitions and Bioinformatic Approaches Cause Systematic Bias , 2008, Evolutionary bioinformatics online.

[12]  G. Gonnet,et al.  ALF—A Simulation Framework for Genome Evolution , 2011, Molecular biology and evolution.

[13]  Tu Minh Phuong,et al.  Multiple alignment of protein sequences with repeats and rearrangements , 2006, Nucleic acids research.

[14]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Eric Rivals,et al.  A Survey On Algorithmic Aspects Of Tandem Repeats Evolution , 2004, Int. J. Found. Comput. Sci..

[16]  Gary Benson,et al.  Reconstructing the Duplication History of a Tandem Repeat , 1999, ISMB.

[17]  Poethig Rs,et al.  Life with 25,000 genes. , 2001 .

[18]  H. Ellegren Microsatellite mutations in the germline: implications for evolutionary inference. , 2000, Trends in genetics : TIG.

[19]  Vincent Vandewalle,et al.  Statistical tests to compare motif count exceptionalities , 2007, BMC Bioinformatics.

[20]  M. Anisimova,et al.  The evolution and function of protein tandem repeats in plants. , 2015, The New phytologist.

[21]  Karen Usdin,et al.  The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. , 2008, Genome research.

[22]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[23]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[24]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[25]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[26]  D. Saville Multiple Comparison Procedures: The Practical Solution , 1990 .

[27]  M. Touchon,et al.  Genesis, effects and fates of repeats in prokaryotic genomes. , 2009, FEMS microbiology reviews.

[28]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[29]  S. Mirkin,et al.  DNA structures, repeat expansions and human hereditary disorders. , 2006, Current opinion in structural biology.

[30]  A. Hannan,et al.  Dynamic mutations as digital genetic modulators of brain development, function and dysfunction , 2007, BioEssays : news and reviews in molecular, cellular and developmental biology.

[31]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[32]  O. Gascuel,et al.  Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.

[33]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[34]  Eric Rivals,et al.  Detecting microsatellites within genomes: significant variation among algorithms , 2007, BMC Bioinformatics.

[35]  Robert D. Finn,et al.  Dfam: a database of repetitive DNA based on profile hidden Markov models , 2012, Nucleic Acids Res..

[36]  Alessio Vecchio,et al.  Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases , 2012, BMC Bioinformatics.

[37]  Silvio C. E. Tosatto,et al.  RepeatsDB: a database of tandem repeat protein structures , 2013, Nucleic Acids Res..

[38]  S. Ganesh,et al.  Tandem repeats in human disorders: mechanisms and evolution. , 2008, Frontiers in bioscience : a journal and virtual library.

[39]  C. E. Pearson,et al.  Repeat instability: mechanisms of dynamic mutations , 2005, Nature Reviews Genetics.

[40]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..