The prediction of virus mutation using neural networks and rough set techniques

Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.

[1]  Fatemeh Zare-Mirakabad,et al.  RNA secondary structure prediction based on SHAPE data in helix regions. , 2015, Journal of theoretical biology.

[2]  Alexander Churkin,et al.  Mutational analysis in RNAs: comparing programs for RNA deleterious mutation prediction , 2011, Briefings Bioinform..

[3]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[4]  Irmtraud M. Meyer,et al.  On the importance of cotranscriptional RNA structure formation , 2013, RNA.

[5]  Jorng-Tzong Horng,et al.  Characterization and prediction of mRNA polyadenylation sites in human genes , 2011, Medical & Biological Engineering & Computing.

[6]  Walter N. Moss,et al.  Folding and finding RNA secondary structure. , 2010, Cold Spring Harbor perspectives in biology.

[7]  Hong Huang Lin,et al.  Computer prediction of drug resistance mutations in proteins. , 2005, Drug discovery today.

[8]  A. Lapedes,et al.  Mapping the Antigenic and Genetic Evolution of Influenza Virus , 2004, Science.

[9]  Emmanouil T Dermitzakis,et al.  From DNA to RNA to disease and back: The 'central dogma' of regulatory disease variation , 2006, Human Genomics.

[10]  Piero Fariselli,et al.  A three-state prediction of single point mutations on protein stability changes , 2007, BMC Bioinformatics.

[11]  L. Tan,et al.  Pathotypical Characterization and Molecular Epidemiology of Newcastle Disease Virus Isolates from Different Hosts in China from 1996 to 2005 , 2007, Journal of Clinical Microbiology.

[12]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[13]  Ryan D. Morin,et al.  Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. , 2008, BioTechniques.

[14]  Pleuni S Pennings,et al.  The population genetics of drug resistance evolution in natural populations of viral, bacterial and eukaryotic pathogens , 2015, Molecular ecology.

[15]  Mert Bal,et al.  Rough Sets Theory as Symbolic Data Mining Method: An Application on Complete Decision Table , 2013 .

[16]  Maciej Kusy,et al.  Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients , 2013, Medical & Biological Engineering & Computing.

[17]  J Xu,et al.  Phylogenetic analysis of canine parvovirus isolates from Sichuan and Gansu provinces of China in 2011. , 2015, Transboundary and emerging diseases.

[18]  Gholamreza Haffari,et al.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data , 2011, Bioinform..

[19]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[20]  Guan M Ke,et al.  Analysis of sequence and haemagglutinin activity of the HN glycoprotein of Newcastle disease virus , 2010, Avian pathology : journal of the W.V.P.A.

[21]  Laurent Guéguen,et al.  Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context. , 2012, Systematic biology.

[22]  Kevin P. Murphy,et al.  SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors , 2010, Bioinform..

[23]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[24]  B. Larder,et al.  Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks. , 2003, The Journal of infectious diseases.

[25]  Roy M. Anderson,et al.  Predicting evolutionary change in the influenza A virus , 2002, Nature Medicine.

[26]  Asger Hobolth,et al.  A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates , 2008 .

[27]  Tatiana Baranovich,et al.  T-705 (Favipiravir) Induces Lethal Mutagenesis in Influenza A H1N1 Viruses In Vitro , 2013, Journal of Virology.

[28]  Thomas Lengauer,et al.  Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Juan-Juan Ren,et al.  Complete Genome Sequence of a Newly Emerging Newcastle Disease Virus , 2013, Genome Announcements.

[30]  Eun-Kyoung Lee,et al.  Molecular Epidemiologic Investigation of Lentogenic Newcastle Disease Virus from Domestic Birds at Live Bird Markets in Korea , 2012, Avian diseases.

[31]  Santiago F. Elena,et al.  Adaptive Value of High Mutation Rates of RNA Viruses: Separating Causes from Consequences , 2005, Journal of Virology.

[32]  Tom Lenaerts,et al.  Predicting virus mutations through statistical relational learning , 2014, BMC Bioinformatics.

[33]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.