Application of Graph Entropy in CRISPR and Repeats Detection in DNA Sequences

We analyzed DNA sequences using a new measure of entropy. The general aim was to analyze DNA sequences and find interesting sections of a genome using a new formulation of Shannon like entropy. We developed this new measure of entropy for any non-trivial graph or, more broadly, for any square matrix whose non-zero elements represent probabilistic weights assigned to connections or transitions between pairs of vertices. The new measure is called the graph entropy and it quantifies the aggregate indeterminacy effected by the variety of unique walks that exist between each pair of vertices. The new tool is shown to be uniquely capable of revealing CRISPR regions in bacterial genomes and to identify Tandem repeats and Direct repeats of genome. We have done experiment on 26 species and found many tandem repeats and direct repeats (CRISPR for bacteria or archaea). There are several existing separate CRISPR or Tandem finder tools but our entropy can find both of these features if present in genome.

[1]  J. Shapiro,et al.  Why repetitive DNA is essential to genome function , 2005, Biological reviews of the Cambridge Philosophical Society.

[2]  L. Schouls,et al.  Identification of genes that are associated with DNA repeats in prokaryotes , 2002, Molecular microbiology.

[3]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[4]  A Danchin,et al.  Functional and evolutionary roles of long repeats in prokaryotes. , 1999, Research in microbiology.

[5]  Ibtissem Grissa,et al.  CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats , 2007, Nucleic Acids Res..

[6]  Alex van Belkum,et al.  Short-Sequence DNA Repeats in Prokaryotic Genomes , 1998, Microbiology and Molecular Biology Reviews.

[7]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[8]  A. Belkum,et al.  Short sequence repeats in microbial pathogenesis and evolution , 1999, Cellular and Molecular Life Sciences CMLS.

[9]  Rodrigo Gouveia-Oliveira,et al.  Genome update: DNA repeats in bacterial genomes. , 2004, Microbiology.

[10]  Ivo Grosse Applications of statistical physics and information theory to the analysis of DNA sequences , 2000 .

[11]  Shweta Mehrotra,et al.  Repetitive Sequences in Plant Nuclear DNA: Types, Distribution, Evolution and Function , 2014, Genom. Proteom. Bioinform..

[12]  J. Doudna,et al.  The new frontier of genome engineering with CRISPR-Cas9 , 2014, Science.

[13]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[14]  E. Rocha,et al.  Associations between inverted repeats and the structural evolution of bacterial genomes. , 2003, Genetics.

[15]  F. J. Mojica,et al.  Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria , 2000, Molecular microbiology.