Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions

The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions −4 and +7 within consensus splice sites being largely uninformative.

[1]  Sandya Liyanarachchi,et al.  Mutations in U4atac snRNA, a Component of the Minor Spliceosome, in the Developmental Disorder MOPD I , 2011, Science.

[2]  Peter G. Korning,et al.  Splice Site Prediction in Arabidopsis Thaliana Pre-mRNA by Combining Local and Global Sequence Information , 1996 .

[3]  James M Ford,et al.  Hereditary diffuse gastric cancer due to a previously undescribed CDH1 splice site mutation. , 2010, Human pathology.

[4]  Abhijit A. Patel,et al.  Splicing double: insights from the second spliceosome , 2003, Nature Reviews Molecular Cell Biology.

[5]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[6]  C. Béroud,et al.  Human Splicing Finder: an online bioinformatics tool to predict splicing signals , 2009, Nucleic acids research.

[7]  Wei Bin and Zhao Jing A Novel Artificial Neural Network and an Improved Particle Swarm Optimization used in Splice Site Prediction , 2014 .

[8]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.

[9]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[10]  C. Burge,et al.  Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. , 2008, RNA.

[11]  Phillip A. Sharp,et al.  A multicomponent complex is involved in the splicing of messenger RNA precursors , 1985, Cell.

[12]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[13]  Marco Baralle,et al.  Functional Analysis of Mutations in Exon 9 of NF1 Reveals the Presence of Several Elements Regulating Splicing , 2015, PloS one.

[14]  Alfons Meindl,et al.  Analysis of 30 Putative BRCA1 Splicing Mutations in Hereditary Breast and Ovarian Cancer Families Identifies Exonic Splice Site Mutations That Escape In Silico Prediction , 2012, PloS one.

[15]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[16]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[17]  Jean-Philippe Vert,et al.  Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants , 2012, Human mutation.

[18]  Julian Peto,et al.  Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. , 2016, Human molecular genetics.

[19]  J. Goo,et al.  Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists , 2004, Korean journal of radiology.

[20]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[21]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[22]  David Haussler,et al.  Improved splice site detection in Genie , 1997, RECOMB '97.

[23]  Christophe Béroud,et al.  Bioinformatics identification of splice site signals and prediction of mutation effects , 2010 .

[24]  Antonio Marín,et al.  Characterization and prediction of alternative splice sites. , 2006, Gene.

[25]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[26]  Daniele Merico,et al.  Compound heterozygous mutations in the noncoding RNU4ATAC cause Roifman Syndrome by disrupting minor intron splicing , 2015, Nature Communications.

[27]  Eric Boerwinkle,et al.  In silico tools for splicing defect prediction - A survey from the viewpoint of end-users , 2013, Genetics in Medicine.

[28]  C. Will,et al.  Spliceosome structure and function. , 2011, Cold Spring Harbor perspectives in biology.

[29]  Kinji Ohno,et al.  Human branch point consensus sequence is yUnAy , 2008, Nucleic acids research.

[30]  T. Cooper,et al.  Pre-mRNA splicing and human disease. , 2003, Genes & development.

[31]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[32]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[33]  A global reference for human genetic variation , 2015, Nature.

[34]  M. C. Valero,et al.  A highly sensitive genetic protocol to detect NF1 mutations. , 2011, The Journal of molecular diagnostics : JMD.