Human polyomaviruses identification by logic mining techniques

BackgroundDifferences in genomic sequences are crucial for the classification of viruses into different species. In this work, viral DNA sequences belonging to the human polyomaviruses BKPyV, JCPyV, KIPyV, WUPyV, and MCPyV are analyzed using a logic data mining method in order to identify the nucleotides which are able to distinguish the five different human polyomaviruses.ResultsThe approach presented in this work is successful as it discovers several logic rules that effectively characterize the different five studied polyomaviruses. The individuated logic rules are able to separate precisely one viral type from the other and to assign an unknown DNA sequence to one of the five analyzed polyomaviruses.ConclusionsThe data mining analysis is performed by considering the complete sequences of the viruses and the sequences of the different gene regions separately, obtaining in both cases extremely high correct recognition rates.

[1]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.

[2]  Robert L. Garcea,et al.  Taxonomical developments in the family Polyomaviridae , 2011, Archives of Virology.

[3]  L. Wang,et al.  Virology Journal , 1966, Nature.

[4]  Tobias Allander,et al.  Identification of a Third Human Polyomavirus , 2007, Journal of Virology.

[5]  B. Padgett,et al.  Cultivation of papova-like virus from human brain with progressive multifocal leucoencephalopathy. , 1971, Lancet.

[6]  D. Coleman,et al.  New human papovavirus (B.K.) isolated from urine after renal transplantation. , 1971, Lancet.

[7]  Giovanni Felici,et al.  Feature Selection for Data Mining , 2006 .

[8]  Giovanni Felici,et al.  Application of feature selection and classification to computational molecular biology , 2008 .

[9]  Giovanni Felici,et al.  Learning to classify species with barcodes , 2009, BMC Bioinformatics.

[10]  Filippo Menczer,et al.  Feature selection in data mining , 2003 .

[11]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[12]  B. Thiers,et al.  Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma , 2009 .

[13]  Evangelos Triantaphyllou,et al.  Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques , 2009 .

[14]  T. A. Hall,et al.  BIOEDIT: A USER-FRIENDLY BIOLOGICAL SEQUENCE ALIGNMENT EDITOR AND ANALYSIS PROGRAM FOR WINDOWS 95/98/ NT , 1999 .

[15]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[16]  Giovanni Felici,et al.  Logic classification and feature selection for biomedical data , 2008, Comput. Math. Appl..

[17]  D. Brennan,et al.  Identification of a Novel Polyomavirus from Patients with Acute Respiratory Tract Infections , 2007, PLoS pathogens.

[18]  J. Gordon,et al.  Mutations in the external loops of BK virus VP1 and urine viral load in renal transplant recipients , 2010, Journal of cellular physiology.

[19]  S. R. Leyton Glucose and insulin in schizophrenia. , 1958, Lancet.