Quantum Computing Approach for Alignment-Free Sequence Search and Classification

Many classes of algorithms that suffer from large complexities when implemented on conventional computers may be reformulated resulting in greatly reduced complexity when implemented on quantum computers. The dramatic reductions in complexity for certain types of quantum algorithms coupled with the computationally challenging problems in some bioinformatics problems motivates researchers to devise efficient quantum algorithms for sequence (DNA, RNA, protein) analysis. This chapter shows that the important sequence classification problem in bioinformatics is suitable for formulation as a quantum algorithm. This chapter leverages earlier research for sequence classification based on Extensible Markov Model (EMM) and proposes a quantum computing alternative. The authors utilize sequence family profiles built using EMM methodology which is based on using pre-counted word data for each sequence. Then a new method termed quantum seeding is proposed for generating a key based on high frequency words. The key is applied in a quantum search based on Grover algorithm to determine a candidate set of models resulting in a significantly reduced search space. Given Z as a function of M models of size N, the quantum version of the seeding algorithm has a time complexity in the order of O Z ( ) as opposed to O(Z) for the standard classic version for large values of Z.

[1]  S. Goodison,et al.  16S ribosomal DNA amplification for phylogenetic study , 1991, Journal of bacteriology.

[2]  Bertil Schmidt,et al.  Integrating FPGA acceleration into HMMer , 2008, Parallel Comput..

[3]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[4]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael Hahsler,et al.  Analyzing taxonomic classification using extensible Markov models , 2010, Bioinform..

[6]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Clarridge,et al.  Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases , 2004, Clinical Microbiology Reviews.

[8]  Paul Adrien Maurice Dirac,et al.  A new notation for quantum mechanics , 1939, Mathematical Proceedings of the Cambridge Philosophical Society.

[9]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[10]  Gilles Brassard,et al.  Strengths and Weaknesses of Quantum Computing , 1997, SIAM J. Comput..

[11]  Michael Hahsler,et al.  Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[12]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[13]  Sergey Petoukhov,et al.  Biological Evolution of Dialects of the Genetic Code , 2010 .

[14]  Abdalla Omer Elkhawad,et al.  The Role of Pharmacovigilance Center in Sudan in Reporting Adverse Drug Reactions , 2012 .

[15]  D. Deutsch Quantum theory, the Church–Turing principle and the universal quantum computer , 1985, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[16]  Maarten Postma,et al.  Pharmacoinformatics and drug discovery technologies: Theories and applications , 2012 .

[17]  Matthew He,et al.  Symmetrical Analysis Techniques for Genetic Systems and Bioinformatics: Advanced Patterns and Applications , 2009 .

[18]  B. Blaisdell,et al.  Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems , 1989, Journal of Molecular Evolution.

[19]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[20]  Michael Hahsler,et al.  Targeted Genomic signature profiling with Quasi-alignment statistics , 2009 .

[21]  Alok N. Choudhary,et al.  Association Rule Mining Based HotSpot Analysis on SEER Lung Cancer Data , 2011, Int. J. Knowl. Discov. Bioinform..

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  Lov K. Grover A fast quantum mechanical algorithm for database search , 1996, STOC '96.

[24]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.