论文信息 - Biological function polarity prediction of missense variants using machine learning

Biological function polarity prediction of missense variants using machine learning

Functional interpretation is crucial when facing on average 20,000 missense variants per human exome, as the great majority are not associated with any underlying disease. In silico bioinformatics tools can predict the deleteriousness of variants or assess their functional impact by assigning scores, but they cannot predict whether the variant in question results in gain or loss of function at the protein level. Here, we show that machine learning can effectively predict this biological function polarity of missense variants. The new method adapts weighted gradient boosting machine approach on a set of damaging variants (1,288 loss of function and 218 gain of function variants) as annotated by the tools SIFT, PolyPhen2 and CADD. Area under the ROC curve of 0.85 illustrates high discriminative power of the classifier. Predictive performance of the classifier remains consistent against an independent set of damaging variants as highlighted by the area under the ROC curve of 0.83. This new approach may help to guide biological experiments on the clinical relevance of damaging genetic variants. Author summary Missense variant occurs when a single genetic alteration in DNA takes place and as a result a new amino acid is translated into the protein. This amino acid change can inactivate the existing protein function causing loss-of-function or produce a new function causing gain-of-function. Therefore, it is very important to interpret these functional consequences of missense variants as they often turn out to be disease causing. Each individual’s genome sequence has thousands of missense variants, out of which very few are actually associated with any underlying disease. Various computational tools have been developed to predict whether missense variants are damaging or not, but none of them can actually predict whether the damaging missense variants cause gain-of-function or loss-of-function. We have developed a new ensemble classifier to predict this biological function polarity at the protein level. The classifier combines the prediction scores of three existing bioinformatics tools and applies machine learning to make effective predictions. We have validated our classifier against an independent data set to show its high predictive power and robustness. The predictions made by our machine learning tool can be used as indicators of biological function polarity, but with further evidence on pathogenicity.

Adhideb Ghosh | Alexander A. Navarini | A. Navarini | Adhideb Ghosh