CRISPRO Identifies Functional Protein Coding Sequences Based on Genome Editing Dense Mutagenesis

CRISPR/Cas9 pooled screening permits parallel evaluation of comprehensive guide RNA libraries to systematically perturb protein coding sequences in situ and correlate with functional readouts. For the analysis and visualization of the resulting datasets we have developed CRISPRO, a computational pipeline that maps functional scores associated with guide RNAs to genome, transcript, and protein coordinates and structure. No available tool has similar functionality. The ensuing genotype-phenotype linear and 3D maps raise hypotheses about structure-function relationships at discrete protein regions. Machine learning based on CRISPRO features improves prediction of guide RNA efficacy. The CRISPRO tool is freely available at gitlab.com/bauerlab/crispro.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  J. Kappes,et al.  A Synonymous Single Nucleotide Polymorphism in ΔF508 CFTR Alters the Secondary Structure of the mRNA and the Expression of the Mutant Protein* , 2010, The Journal of Biological Chemistry.

[3]  J. Kinney,et al.  Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains , 2015, Nature Biotechnology.

[4]  Gregory J. Hannon,et al.  A CRISPR Resource for Individual, Combinatorial, or Multiplexed Gene Knockout , 2017, Molecular cell.

[5]  Yang Zhang,et al.  A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction , 2013, Scientific Reports.

[6]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[7]  J. Doudna,et al.  The new frontier of genome engineering with CRISPR-Cas9 , 2014, Science.

[8]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[9]  David Balchin,et al.  In vivo aspects of protein folding and quality control , 2016, Science.

[10]  Matthew C. Canver,et al.  Variant-aware saturating mutagenesis using multiple Cas9 nucleases identifies regulatory elements at trait-associated loci , 2017, Nature Genetics.

[11]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[12]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  David R. Liu,et al.  Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage , 2016, Nature.

[14]  Shiyou Zhu,et al.  High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells , 2014, Nature.

[15]  Adam Lavertu,et al.  Frameshift indels introduced by genome editing can lead to in-frame exon skipping , 2017, PloS one.

[16]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[17]  Deniz M. Ozata,et al.  CRISPR/Cas9-mediated genome editing induces exon skipping by alternative splicing or exon deletion , 2017, Genome Biology.

[18]  Sadis Matalon,et al.  The silent codon change I507‐ATC→ATT contributes to the severity of the ΔF508 CFTR channel dysfunction , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[19]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[20]  Matthew C. Canver,et al.  BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis , 2015, Nature.

[21]  Alfonso Valencia,et al.  APPRIS 2017: principal isoforms for multiple gene sets , 2017, Nucleic Acids Res..

[22]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[25]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[26]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[27]  Gaelen T. Hess,et al.  Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens , 2017, Nature Communications.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Matthew C. Canver,et al.  Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin , 2016, Science.

[30]  Sangsu Bae,et al.  Microhomology-based choice of Cas9 nuclease target sites , 2014, Nature Methods.

[31]  David R. Spring,et al.  Allosteric modulation of AURKA kinase activity by a small-molecule inhibitor of its protein-protein interaction with TPX2 , 2016, Scientific Reports.

[32]  Joshua M. Korn,et al.  CRISPR Screens Provide a Comprehensive Assessment of Cancer Vulnerabilities but Generate False-Positive Hits for Highly Amplified Genomic Regions. , 2016, Cancer discovery.

[33]  F. Supek,et al.  The rules and impact of nonsense-mediated mRNA decay in human cancers , 2016, Nature Genetics.

[34]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[37]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[38]  Nicole M. Gaudelli,et al.  Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage , 2017, Nature.

[39]  D. Cacchiarelli,et al.  Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. , 2016, Cell reports.

[40]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[41]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[42]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[43]  Yan Liu,et al.  Structure-based design and SAR development of 5,6-dihydroimidazolo[1,5-f]pteridine derivatives as novel Polo-like kinase-1 inhibitors. , 2017, Bioorganic & medicinal chemistry letters.

[44]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[45]  David R. Liu,et al.  Evolved Cas9 variants with broad PAM compatibility and high DNA specificity , 2018, Nature.

[46]  John G. Doench,et al.  Creation of Novel Protein Variants with CRISPR/Cas9-Mediated Mutagenesis: Turning a Screening By-Product into a Discovery Tool , 2017, PloS one.

[47]  Fa Liu,et al.  Structural and functional analyses of minimal phosphopeptides targeting the polo-box domain of polo-like kinase 1 , 2009, Nature Structural &Molecular Biology.

[48]  Neville E. Sanjana,et al.  High-throughput functional genomics using CRISPR–Cas9 , 2015, Nature Reviews Genetics.

[49]  Haley R Pipkins,et al.  Polyamine transporter potABCD is required for virulence of encapsulated but not nonencapsulated Streptococcus pneumoniae , 2017, PloS one.