SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach

Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where SARS-CoV-2 virus records are kept all over the world. Our experimental results have revealed that now the detection of the geographic distribution of the virus with the CNN models might serve as an efficient method.

[1]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[2]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[3]  Minghua Deng,et al.  A Lasso regression model for the construction of microRNA-target regulatory networks , 2011, Bioinform..

[4]  Jian-Rong Yang,et al.  Genomic variations of SARS-CoV-2 suggest multiple outbreak sources of transmission , 2020, medRxiv.

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.

[7]  Zheng Kou,et al.  Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus , 2020, Infectious Diseases of Poverty.

[8]  K. Brengel-Pesce,et al.  Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals an amino acid deletion in nsp2 (Asp268del) , 2020, Clinical Microbiology and Infection.

[9]  Cihan H. Dagli,et al.  DenseNet for Anatomical Brain Segmentation , 2018 .

[10]  M. Thomson,et al.  Phylodynamics of SARS-CoV-2 transmission in Spain , 2020, bioRxiv.

[11]  Michael Wainberg,et al.  Deep learning in biomedicine , 2018, Nature Biotechnology.

[12]  Jack Bresenham,et al.  A linear algorithm for incremental digital display of circular arcs , 1977, CACM.

[13]  Richard D. White,et al.  Performance of a Deep Neural Network Algorithm Based on a Small Medical Image Dataset: Incremental Impact of 3D-to-2D Reformation Combined with Novel Data Augmentation, Photometric Conversion, or Transfer Learning , 2019, Journal of Digital Imaging.

[14]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[15]  Cesare Furlanello,et al.  Phylogenetic convolutional neural networks in metagenomics , 2017, BMC Bioinformatics.

[16]  Xiaodong Wang,et al.  An Improved DenseNet Method Based on Transfer Learning for Fundus Medical Images , 2018, 2018 7th International Conference on Digital Home (ICDH).

[17]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[18]  William S. Klug,et al.  Essentials of Genetics , 1993 .

[19]  Saeed Babaeizadeh,et al.  Densely connected convolutional networks and signal quality analysis to detect atrial fibrillation using short single-lead ECG recordings , 2017, 2017 Computing in Cardiology (CinC).

[20]  Nishidh Chavda,et al.  An efficient deconvolution technique by identification and estimation of blur , 2016, 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT).

[21]  K. Brengel-Pesce,et al.  Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals an amino acid deletion in nsp2 (Asp268del) , 2020, bioRxiv.

[22]  Giuliano Rizzardini,et al.  Genomic characterization and phylogenetic analysis of SARS‐COV‐2 in Italy , 2020, Journal of medical virology.

[23]  Atanu Basu,et al.  Full-genome sequences of the first two SARS-CoV-2 viruses from India , 2020, The Indian journal of medical research.

[24]  Alejandro A. Schäffer,et al.  Virus Variation Resource – improved response to emergent viral outbreaks , 2016, Nucleic Acids Res..

[25]  L. Poon,et al.  Emergence of a novel human coronavirus threatening human health , 2020, Nature Medicine.

[26]  Jinze Liu,et al.  Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach , 2018, BMC Genomics.

[27]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[28]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[29]  Reza Ghaeini,et al.  A Deep Learning Approach for Cancer Detection and Relevant Gene Identification , 2017, PSB.

[30]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.