Optimization techniques for string selection and comparison problems in genomics

In this article, a discussion of optimization issues occurring in the area of genomics such as string comparison and selection problems are discussed. With this objective, an important part of the existing results in this area will be discussed. The problems that are of interest in this paper include the closest string problem (CSP), closest substring problem (CSSP), farthest string problem (FSP), farthest substring problem (FSSP), and far from most string (FFMSP) problem. The paper presents a detailed view of the most important problems occurring in the area of string comparison and selection, using the Hamming distance measure is given.

[1]  Bin Ma,et al.  Distinguishing string selection problems , 2003, SODA '99.

[2]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[3]  Ami Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[4]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[5]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[6]  Xiaoqiu Huang,et al.  Bio‐sequence comparison and applications , 2002 .

[7]  Andrzej Lingas,et al.  Efficient approximation algorithms for the Hamming center problem , 1999, SODA '99.

[8]  K. Lucas,et al.  An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes , 1991, Comput. Appl. Biosci..

[9]  Joaquín Dopazo,et al.  Design of primers for PCR amplification of highly variable genomes , 1993, Comput. Appl. Biosci..

[10]  Michael R. Fellows,et al.  Parameterized Complexity: The Main Ideas and Connections to Practical Computing , 2000, Experimental Algorithmics.

[11]  Rolf Niedermeier,et al.  Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems , 2003, Algorithmica.

[12]  Panos M. Pardalos,et al.  Optimal Solutions for the Closest-String Problem via Integer Programming , 2004, INFORMS J. Comput..

[13]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[14]  G. Stormo,et al.  Identification of consensus patterns in unaligned dna and protein sequences: a large-deviation stati , 1995 .

[15]  Rolf Niedermeier,et al.  On the Parameterized Intractability of CLOSEST SUBSTRINGsize and Related Problems , 2002, STACS.

[16]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[17]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Tao Jiang,et al.  Bio-Sequence Comparison and Applications , 2002 .

[19]  Giuseppe Lancia,et al.  Banishing Bias from Consensus Sequences , 1997, CPM.

[20]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[21]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..