Alignment-Free Z-Curve Genomic Cepstral Coefficients and Machine Learning for Classification of Viruses

Accurate detection of pathogenic viruses has become highly imperative. This is because viral diseases constitute a huge threat to human health and wellbeing on a global scale. However, both traditional and recent techniques for viral detection suffer from various setbacks. In codicil, some of the existing alignment-free methods are also limited with respect to viral detection accuracy. In this paper, we present the development of an alignment-free, digital signal processing based method for pathogenic viral detection named Z-Curve Genomic Cesptral Coefficients (ZCGCC). To evaluate the method, ZCGCC were computed from twenty six pathogenic viral strains extracted from the ViPR corpus. Naïve Bayesian classifier, which is a popular machine learning method was experimentally trained and validated using the extracted ZCGCC and other alignment-free methods in the literature. Comparative results show that the proposed ZCGCC gives good accuracy (93.0385%) and improved performance to existing alignment-free methods.

[1]  Changchuan Yin,et al.  A Novel Construction of Genome Space with Biological Geometry , 2010, DNA research : an international journal for rapid publication of reports on genes and genomes.

[2]  Changchuan Yin,et al.  Virus classification in 60-dimensional protein space. , 2016, Molecular phylogenetics and evolution.

[3]  Ren Zhang,et al.  A Brief Review: The Z-curve Theory and its Application in Genome Analysis , 2014, Current genomics.

[4]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[5]  Mai S. Mabrouk,et al.  A Study of the Potential of EIIP Mapping Method in Exon Prediction Using the Frequency Domain Techniques , 2012 .

[6]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[7]  Richard Millham,et al.  Experimentation using short-term spectral features for secure mobile internet voting authentication , 2015 .

[8]  Dimitris Anastassiou DSP in genomics: processing and frequency-domain analysis of character strings , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Z. Duan,et al.  New strategy for virus discovery: viruses identified in human feces in the last decade , 2013, Science China Life Sciences.

[10]  Saurabh Sinha,et al.  A statistical method for alignment-free comparison of regulatory sequences , 2007, ISMB/ECCB.

[11]  Troy Hernandez,et al.  Global comparison of multiple-segmented viruses in 12-dimensional genome space. , 2014, Molecular phylogenetics and evolution.

[12]  Karthika Vijayan,et al.  Classification of Organisms using Frequency-Chaos Game Representation of Genomic Sequences and ANN , 2009 .

[13]  F. Rohwer,et al.  Metagenomics and future perspectives in virus discovery , 2012, Current Opinion in Virology.

[14]  Robert B. Randall,et al.  A history of cepstrum analysis and its application to mechanical problems , 2017 .

[15]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[16]  S SathishKumar,et al.  An Effective Identification of Species from DNA Sequence: A Classification Technique by Integrating , 2012 .

[17]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[18]  Yan Lin,et al.  Recognition of Protein-coding Genes Based on Z-curve Algorithms , 2014, Current genomics.

[19]  R. Scheuermann,et al.  Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community , 2012, Viruses.

[20]  Chenglong Yu,et al.  A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications , 2011, PloS one.

[21]  Ajeet Kaushik,et al.  Towards detection and diagnosis of Ebola virus disease at point-of-care , 2015, Biosensors and Bioelectronics.

[22]  Troy Hernandez,et al.  Real Time Classification of Viruses in 12 Dimensions , 2013, PloS one.

[23]  Hon Keung Kwan,et al.  Advanced Numerical Representation of DNA Sequences , 2022 .

[24]  Jianfeng Shao,et al.  SNR of DNA sequences mapped by general affine transformations of the indicator sequences , 2013, Journal of mathematical biology.

[25]  Oludayo O. Olugbara,et al.  Improved Classification of Lung Cancer Using Radial Basis Function Neural Network with Affine Transforms of Voss Representation , 2015, PloS one.

[26]  Oludayo O. Olugbara,et al.  Identification of Pathogenic Viruses Using Genomic Cepstral Coefficients with Radial Basis Function Neural Network , 2015, NaBIC.

[27]  Yanchun Yang,et al.  Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison , 2008, Bioinform..

[28]  Marion O. Adebiyi,et al.  Experimental Investigation of Frequency Chaos Game Representation for in Silico and Accurate Classification of Viral Pathogens from Genomic Sequences , 2017, IWBBIO.