Clique-detection models in computational biochemistry and genomics

Abstract Many important problems arising in computational biochemistry and genomics have been formulated in terms of underlying combinatorial optimization models. In particular, a number have been formulated as clique-detection models. The proposed article includes an introduction to the underlying biochemistry and genomic aspects of the problems as well as to the graph-theoretic aspects of the solution approaches. Each subsequent section describes a particular type of problem, gives an example to show how the graph model can be derived, summarizes recent progress, and discusses challenges associated with solving the associated graph-theoretic models. Clique-detection models include prescribing (a) a maximal clique, (b) a maximum clique, (c) a maximum weighted clique, or (d) all maximal cliques in a graph. The particular types of biochemistry and genomics problems that can be represented by a clique-detection model include integration of genome mapping data, nonoverlapping local alignments, matching and comparing molecular structures, and protein docking.

[1]  Etsuji Tomita,et al.  A Simple Algorithm for Finding a Maximum Clique and Its Worst-Case Time Complexity , 1990, Systems and Computers in Japan.

[2]  J. Jeffry Howbert,et al.  The Maximum Clique Problem , 2007 .

[3]  A. Lesk COMPUTATIONAL MOLECULAR BIOLOGY , 1988, Proceeding of Data For Discovery.

[4]  Luitpold Babel Finding maximum cliques in arbitrary and in special graphs , 2005, Computing.

[5]  M. Trick,et al.  Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Workshop, October 11-13, 1993 , 1996 .

[6]  Egon Balas,et al.  Finding a Maximum Clique in an Arbitrary Graph , 1986, SIAM J. Comput..

[7]  A. Ghose,et al.  Geometrically feasible binding modes of a flexible ligand molecule at the receptor site , 1985 .

[8]  P Willett,et al.  Using a genetic algorithm to identify common structural features in sets of ligands. , 1997, Journal of molecular graphics & modelling.

[9]  P. Pardalos,et al.  An exact algorithm for the maximum clique problem , 1990 .

[10]  Peter Willett,et al.  Graph-Theoretic Techniques for Macromolecular Docking , 2000, J. Chem. Inf. Comput. Sci..

[11]  Michel Gendreau,et al.  An Efficient Implicit Enumeration Algorithm for the Maximum Clique Problem , 1988 .

[12]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[13]  E. Lander,et al.  Genomic mapping by anchoring random clones: a mathematical analysis. , 1991, Genomics.

[14]  Shigenori Maeda,et al.  Automated recognition of common geometrical patterns among a variety of three-dimensional moleculars structures , 1987 .

[15]  John Bradshaw,et al.  Similarity Searching Using Reduced Graphs , 2003, J. Chem. Inf. Comput. Sci..

[16]  Valeriĭ Efimovich Golender,et al.  Logical and combinatorial algorithms for drug design , 1983 .

[17]  Johan Håstad,et al.  Clique is hard to approximate within n/sup 1-/spl epsiv// , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[18]  Peter Willett,et al.  CLIP: Similarity Searching of 3D Databases Using Clique Detection , 2003, J. Chem. Inf. Comput. Sci..

[19]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[20]  R Samudrala,et al.  Handling context‐sensitivity in protein structures using graph theory: Bona fide prediction , 1997, Proteins.

[21]  P. Pardalos,et al.  Handbook of Combinatorial Optimization , 1998 .

[22]  R Samudrala,et al.  A graph-theoretic algorithm for comparative modeling of protein structure. , 1998, Journal of molecular biology.

[23]  Kengo Kinoshita,et al.  Probabilistic description of protein alignments for sequences and structures , 2004, Proteins.

[24]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments , 1993, Nucleic Acids Res..

[25]  Ali E. Abbas,et al.  Bioinformatics and Management Science: Some Common Tools and Techniques , 2004, Oper. Res..

[26]  Volker Heun,et al.  Approximate protein folding in the HP side chain model on extended cubic lattices , 1999, Discret. Appl. Math..

[27]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[28]  M. D. Frank-Kamenet︠s︡kiĭ,et al.  Unraveling DNA : the most important molecule of life , 1997 .

[29]  J. Håstad Clique is hard to approximate within n 1-C , 1996 .

[30]  Michael A. Langston,et al.  High performance computational tools for Motif discovery , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[31]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[32]  Lan Lin,et al.  A Combinatorial Approach to the Analysis of Differential Gene Expression Data , 2005 .

[33]  Eric Harley,et al.  Revealing hidden interval graph structure in STS-content data , 1999, Bioinform..

[34]  Giuseppe Avondo Bodino,et al.  Economic applications of the theory of graphs , 1962 .

[35]  Eleanor J. Gardiner,et al.  Clique-detection algorithms for matching three-dimensional molecular structures. , 1997, Journal of molecular graphics & modelling.

[36]  F. Crick,et al.  Genetical Implications of the Structure of Deoxyribonucleic Acid , 1953, Nature.

[37]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[38]  Peter Willett,et al.  Algorithms for the identification of three-dimensional maximal common substructures , 1987, J. Chem. Inf. Comput. Sci..

[39]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[40]  Tatsuya Akutsu,et al.  Point matching under non-uniform distortions and protein side chain packing based on an efficient maximum clique algorithm. , 2002, Genome informatics. International Conference on Genome Informatics.

[41]  Panos M. Pardalos,et al.  On maximum clique problems in very large graphs , 1999, External Memory Algorithms.

[42]  R. Ravi,et al.  Nonoverlapping Local Alignments (weighted Independent Sets of Axis-parallel Rectangles) , 1996, Discret. Appl. Math..

[43]  A. Nagurney Innovations in Financial and Economic Networks , 2003 .

[44]  Chris Sander,et al.  GeneQuiz: A Workbench for Sequence Analysis , 1994, ISMB.

[45]  Piotr Berman,et al.  A d/2 Approximation for Maximum Weight Independent Set in d-Claw Free Graphs , 2000, Nord. J. Comput..

[46]  Zvi Galil,et al.  Proceedings of the 30th IEEE symposium on Foundations of computer science , 1994, FOCS 1994.

[47]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[48]  R. Carr,et al.  Branch-and-Cut Algorithms for Independent Set Problems: Integrality Gap and An Application to Protein Structure Alignment , 2000 .

[49]  Fred R. McMorris,et al.  On Probe Interval Graphs , 1998, Discret. Appl. Math..

[50]  G. Tintner,et al.  Economic Applications of the Theory of Graphs. , 1963 .

[51]  Eric Harley,et al.  Uniform integration of genome mapping data using intersection graphs , 2001, Bioinform..

[52]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[53]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[54]  Yvonne C. Martin,et al.  A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists , 1993, J. Comput. Aided Mol. Des..

[55]  Faisal N. Abu-Khzam,et al.  Scalable parallel algorithms for difficult combinatorial problems: A case study in optimization , 2004, Parallel and Distributed Computing and Networks.

[56]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[57]  J. Moon,et al.  On cliques in graphs , 1965 .

[58]  D. K. Friesen,et al.  A combinatorial algorithm for calculating ligand binding , 1984 .

[59]  Yoshimasa Takahashi,et al.  SS3D-P2: a three dimensional substructure search program for protein motifs based on secondary structure elements , 1997, Comput. Appl. Biosci..

[60]  John Moult,et al.  Molecular modeling of protein function regions , 2004, Proteins.

[61]  Harvey J. Greenberg,et al.  Opportunities for Combinatorial Optimization in Computational Biology , 2004, INFORMS J. Comput..

[62]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[63]  Jacek Blazewicz,et al.  Selected combinatorial problems of computational biology , 2005, Eur. J. Oper. Res..

[64]  Vijay Chandru,et al.  The algorithmics of folding proteins on lattices , 2003, Discret. Appl. Math..

[65]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .