Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines

Infection by the human papillomavirus (HPV) is associated with the development of cervical cancer. HPV can be classified to high- and low-risk type according to its malignant potential, and detection of the risk type is important to understand the mechanisms and diagnose potential patients. In this paper, we classify the HPV protein sequences by support vector machines. A string kernel is introduced to discriminate HPV protein sequences. The kernel emphasizes amino acids pairs with a distance. In the experiments, our approach is compared with previous methods in accuracy and F1-score, and it has showed better performance. Also, the prediction results for unknown HPV types are presented.

[1]  Julian Peto,et al.  Prevalence of Human Papillomavirus in Cervical Cancer: a Worldwide Perspective , 1995 .

[2]  Chengqi Zhang,et al.  PRICAI 2004: Trends in Artificial Intelligence , 2004, Lecture Notes in Computer Science.

[3]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[4]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[5]  D. Merlo,et al.  Prevalence of human papillomavirus cervical infection in an Italian asymptomatic population , 2005, BMC infectious diseases.

[6]  L Beardsley,et al.  Sexual behavior and partner characteristics are the predominant risk factors for genital human papillomavirus infection in young women. , 1996, The Journal of infectious diseases.

[7]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[8]  Minoru Irahara,et al.  Human papilloma virus (HPV) and cervical cancer. , 2002, The journal of medical investigation : JMI.

[9]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[10]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[11]  S. Beaudenon,et al.  Two novel genital human papillomavirus (HPV) types, HPV68 and HPV70, related to the potentially oncogenic HPV39 , 1996, Journal of clinical microbiology.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Byoung-Tak Zhang,et al.  Prediction of the Risk Types of Human Papillomaviruses by Support Vector Machines , 2004, PRICAI.

[14]  Byoung-Tak Zhang,et al.  Mining the Risk Types of Human Papillomavirus (HPV) by AdaCost , 2003, DEXA.

[15]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[16]  M. Nair,et al.  High-Risk Human Papillomavirus Infection and E6 Protein Expression in Lesions of the Uterine Cervix , 1998, Pathobiology.

[17]  F. X. Bosch,et al.  Epidemiologic classification of human papillomavirus types associated with cervical cancer. , 2003, The New England journal of medicine.

[18]  E. Stockfleth,et al.  Association of rare human papillomavirus types with genital premalignant and malignant lesions. , 1998, The Journal of infectious diseases.

[19]  M. Janicek,et al.  Cervical Cancer: Prevention, Diagnosis, and Therapeutics , 2001, CA: a cancer journal for clinicians.