Building consensus of Human Papillomavirus using Genetic Algorithm

The paper introduces a novel three tier architecture to find consensus of Human Papillomavirus (HPV). The proposed procedure is based on simulation and uses all complete genomic DNA sequences of registered HPV strains available in NCBI GenBank. It uses the multiple sequence alignment tool Clustal X to align these sequences. Genetic Algorithm is used to evolve an optimized population of complete genomic DNA sequences. The GA, which uses domain specific genetic operators like migration, rank selection, mutation and crossover, adopts a novel approach in defining the fitness function. A modified approach of the Weight Matrix Algorithm is applied on the optimized and evolved population to find a consensus of HPV. The effectiveness of the procedure is validated with experimental results.

[1]  Jerzy W. Jaromczyk,et al.  The genetic algorithm scheme for consensus sequences , 2007, 2007 IEEE Congress on Evolutionary Computation.

[2]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[3]  Harald zur Hausen,et al.  Papillomavirus infections — a major cause of human cancers , 1996 .

[4]  Harald zur Hausen,et al.  Papillomaviruses Causing Cancer: Evasion From Host-Cell Control in Early Events in Carcinogenesis , 2000 .

[5]  Marcel Turcotte,et al.  Algorithms in bioinformatics (CSI 5126) 1 , 2009 .

[6]  E. Wilander,et al.  Two consensus primer systems and nested polymerase chain reaction for human papillomavirus detection in cervical biopsies: A study of sensitivity. , 1996, Human pathology.

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[9]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10]  Javid Taheri,et al.  RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem , 2009, BMC Genomics.

[11]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[12]  Navjeevan Singh,et al.  Koilocytosis: Correlations with high‐risk HPV and its comparison on tissue sections and cytology, urothelial carcinoma , 2009, Diagnostic Cytopathology.

[13]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[14]  Peter Adams,et al.  A simulated annealing algorithm for finding consensus sequences , 2002, Bioinform..

[15]  James A. Foster,et al.  Evolving Consensus Sequence for Multiple Sequence Alignment with a Genetic Algorithm , 2003, GECCO.

[16]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[17]  Peter J. F. Snijders,et al.  Bead-Based Multiplex Genotyping of Human Papillomaviruses , 2006, Journal of Clinical Microbiology.

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  J. Carlson,et al.  Modified General Primer PCR System for Sensitive Detection of Multiple Types of Oncogenic Human Papillomavirus , 2009, Journal of Clinical Microbiology.

[20]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[21]  J. Foster,et al.  Evolving Consensus Sequences with a Genetic Algorithm , 2003 .

[22]  M. Karagas,et al.  Exposure profiles and human papillomavirus infection in skin cancer: an analysis of 25 genus beta-types in a population-based study. , 2008, The Journal of investigative dermatology.

[23]  R. Burk,et al.  PCR detection of human papillomavirus: comparison between MY09/MY11 and GP5+/GP6+ primer systems , 1997, Journal of clinical microbiology.

[24]  L. Turek,et al.  Human papillomavirus in oral exfoliated cells and risk of head and neck cancer. , 2003, Journal of the National Cancer Institute.

[25]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[26]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[27]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[28]  R. Burnett,et al.  HPV genotypes in women with squamous intraepithelial lesions and normal cervixes participating in a community-based microbicide study in Pretoria, South Africa. , 2009, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[29]  J. Califano,et al.  The role of human papillomavirus in oral carcinogenesis. , 2004, Critical reviews in oral biology and medicine : an official publication of the American Association of Oral Biologists.

[30]  L. Turek,et al.  Human Papillomavirus in Oral Exfoliated Cells and Risk of Head and Neck Cancer , 2004 .

[31]  Jan A Snyman,et al.  Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms , 2005 .

[32]  World Congress on Nature & Biologically Inspired Computing, NaBIC 2009, 9-11 December 2009, Coimbatore, India , 2009, NaBIC.

[33]  W. Westra,et al.  Distinct risk factor profiles for human papillomavirus type 16-positive and human papillomavirus type 16-negative head and neck cancers. , 2008, Journal of the National Cancer Institute.