CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems

Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein–binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.

[1]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[2]  Stephan H. Bernhart,et al.  Strategies for measuring evolutionary conservation of RNA secondary structures , 2008, BMC Bioinformatics.

[3]  Jing Zhang,et al.  Structure and mechanism of the CMR complex for CRISPR-mediated antiviral immunity. , 2012, Molecular cell.

[4]  Dipali G. Sashital,et al.  An RNA-induced conformational change required for CRISPR RNA cleavage by the endoribonuclease Cse3 , 2011, Nature Structural &Molecular Biology.

[5]  Avital Brodt,et al.  CRISPR loci reveal networks of gene exchange in archaea , 2011, Biology Direct.

[6]  R. Terns,et al.  Interaction of the Cas6 riboendonuclease with CRISPR RNAs: recognition and cleavage. , 2011, Structure.

[7]  Samuel H Sternberg,et al.  Mechanism of substrate selection by a highly specific CRISPR endoribonuclease. , 2012, RNA.

[8]  L. Randau,et al.  RNA processing in the minimal organism Nanoarchaeum equitans , 2012, Genome Biology.

[9]  Jan Gorodkin,et al.  Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix , 2007, PLoS Comput. Biol..

[10]  J. Doudna,et al.  RNA-guided genetic silencing systems in bacteria and archaea , 2012, Nature.

[11]  Ibtissem Grissa,et al.  CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats , 2007, Nucleic Acids Res..

[12]  Shiraz A. Shah,et al.  Archaeal CRISPR-based immune systems: exchangeable functional modules. , 2011, Trends in microbiology.

[13]  Joshua R. Elmore,et al.  Essential features and rational design of CRISPR RNAs that function with the Cas RAMP module complex to cleave RNAs. , 2012, Molecular cell.

[14]  Daniel H. Haft,et al.  A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes , 2005, PLoS Comput. Biol..

[15]  Rolf Backofen,et al.  Characterization of CRISPR RNA processing in Clostridium thermocellum and Methanococcus maripaludis , 2012, Nucleic acids research.

[16]  Shiraz A. Shah,et al.  CRISPR/Cas and Cmr modules, mobility and evolution of adaptive immune systems. , 2011, Research in microbiology.

[17]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[18]  Albert J R Heck,et al.  Structural basis for CRISPR RNA-guided DNA recognition by Cascade , 2011, Nature Structural &Molecular Biology.

[19]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[20]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[21]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[24]  Stan J. J. Brouns,et al.  Clustered regularly interspaced short palindromic repeats (CRISPRs): the hallmark of an ingenious antiviral defense mechanism in prokaryotes , 2011, Biological chemistry.

[25]  Rolf Backofen,et al.  Two CRISPR-Cas systems inMethanosarcina mazeistrain Gö1 display common processing features despite belonging to different types I and III , 2013, RNA biology.

[26]  Philippe Horvath,et al.  Comparative analysis of CRISPR loci in lactic acid bacteria genomes. , 2009, International journal of food microbiology.

[27]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[28]  Hanah Margalit,et al.  A genome-wide view of the expression and processing patterns of Thermus thermophilus HB8 CRISPR RNAs. , 2012, RNA.

[29]  Haixu Tang,et al.  Diverse CRISPRs Evolving in Human Microbiomes , 2012, PLoS genetics.

[30]  Philippe Horvath,et al.  The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA , 2010, Nature.

[31]  V. Kunin,et al.  Evolutionary conservation of sequence and secondary structures in CRISPR repeats , 2007, Genome Biology.

[32]  Stan J. J. Brouns,et al.  Small CRISPR RNAs Guide Antiviral Defense in Prokaryotes , 2008, Science.

[33]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[34]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[35]  J. Doudna,et al.  Csy4 relies on an unusual catalytic dyad to position and cleave CRISPR RNA , 2012, The EMBO journal.

[36]  Robert D. Finn,et al.  Rfam: Wikipedia, clans and the “decimal” release , 2010, Nucleic Acids Res..

[37]  Sonja J. Prohaska,et al.  Computational RNomics of Drosophilids , 2007, BMC Genomics.

[38]  L. Marraffini,et al.  Mature clustered, regularly interspaced, short palindromic repeats RNA (crRNA) length is measured by a ruler mechanism anchored at the precursor processing site , 2011, Proceedings of the National Academy of Sciences.

[39]  Rolf Backofen,et al.  CRISPR-Cas Systems in the Cyanobacterium Synechocystis sp. PCC6803 Exhibit Distinct Processing Pathways Involving at Least Two Cas6 and a Cmr2 Protein , 2013, PloS one.

[40]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[41]  A. MacMillan,et al.  Recognition and maturation of effector RNAs in a CRISPR interference pathway , 2011, Nature Structural &Molecular Biology.

[42]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[43]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[44]  Kristin Reiche,et al.  Structural profiles of human miRNA families from pairwise clustering , 2009, Bioinform..

[45]  Hongwei Wang,et al.  Cas5d protein processes pre-crRNA and assembles into a cascade-like interference complex in subtype I-C/Dvulg CRISPR-Cas system. , 2012, Structure.

[46]  R. Barrangou,et al.  In vitro reconstitution of Cascade‐mediated CRISPR immunity in Streptococcus thermophilus , 2013, The EMBO journal.

[47]  J. Vogel,et al.  CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III , 2011, Nature.

[48]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[49]  Qiang Li,et al.  Duplicated RNA genes in teleost Fish genomes , 2008, J. Bioinform. Comput. Biol..

[50]  Jacques Nicolas,et al.  CRISPI: a CRISPR interactive database , 2009, Bioinform..

[51]  Stan J. J. Brouns,et al.  Evolution and classification of the CRISPR–Cas systems , 2011, Nature Reviews Microbiology.

[52]  Peter F. Stadler,et al.  Memory Efficient Folding Algorithms for Circular RNA Secondary Structures , 2006, German Conference on Bioinformatics.

[53]  R. Terns,et al.  CRISPR-based adaptive immune systems. , 2011, Current opinion in microbiology.

[54]  Eugene V Koonin,et al.  Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems , 2011, Biology Direct.

[55]  Ibtissem Grissa,et al.  The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats , 2007, BMC Bioinformatics.

[56]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[57]  Jennifer A. Doudna,et al.  Sequence- and Structure-Specific RNA Processing by a CRISPR Endonuclease , 2010, Science.

[58]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[59]  Peter F. Stadler,et al.  RNAz 2.0: Improved Noncoding RNA Detection , 2010, Pacific Symposium on Biocomputing.

[60]  Anton J. Enright,et al.  Network visualization and analysis of gene expression data using BioLayout Express3D , 2009, Nature Protocols.

[61]  P. Stadler,et al.  Structure of transfer RNAs: similarity and variability , 2012, Wiley interdisciplinary reviews. RNA.

[62]  Shiraz A. Shah,et al.  Protospacer recognition motifs Mixed identities and functional diversity , 2013 .

[63]  Stephen K Burley,et al.  Cas5d processes pre-crRNA and is a member of a larger family of CRISPR RNA endonucleases. , 2012, RNA.

[64]  P. Stadler,et al.  LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. , 2012, RNA.

[65]  S. Dongen Graph clustering by flow simulation , 2000 .