Bioinformatics and Management Science: Some Common Tools and Techniques

In April of 2003,Science (2003) andNature (2003) published special issues marking two significant achievements in the history of science: the 50th anniversary of discovering the double helical structure of the DNA, and the completion of the Human Genome Project. The first discovery led to a new age in genetics, and the second event marked the beginning of a new era that uses the genome in medicine. The international efforts to determine the human DNA sequence and assess its ethical, legal, and social implications started in 1990. Since then, the data from the project has been available in public databases for researchers and scientists around the world. The vast increase in biological data led to increasing interest in computational biology and an emerging multidisciplinary research area known as bioinformatics. Most people working in this area have mathematics, biology, biochemistry, or computer science backgrounds and have learned about the field by using tools from another discipline to answer questions in biology. The current challenge is to utilize the genome data to its full extent and to develop tools that improve our understanding of biological pathways and accelerate drug discovery. Many of the algorithms needed to solve these problems have management science and operations research aspects. This paper introduces some of the fundamental problems in bioinformatics to an operations research audience and demonstrates the application of management science tools in their formulation and solution.

[1]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[4]  M. Levitt Protein folding by restrained energy minimization and molecular dynamics. , 1983, Journal of molecular biology.

[5]  Edward Hooper,et al.  The River , 2018, Jew Boy.

[6]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[7]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[8]  J. Farris The Logical Basis of Phylogenetic Analysis , 2004 .

[9]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[10]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[11]  Edda Klipp,et al.  Systems Biology , 1994 .

[12]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[13]  S. Jeffery Evolution of Protein Molecules , 1979 .

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[16]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[17]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Romero,et al.  Applications of simulated annealing to the multiple-minima problem in small peptides. , 1991, Journal of biomolecular structure & dynamics.

[19]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[20]  Narayanan Eswar,et al.  MODBASE, a database of annotated comparative protein structure models , 2002, Nucleic Acids Res..

[21]  P. Diaconis,et al.  Random walks on trees and matchings , 2002 .

[22]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[23]  M. Levitt,et al.  Molecular dynamics of native protein. I. Computer simulation of trajectories. , 1983, Journal of molecular biology.

[24]  M. Levitt,et al.  Refinement of protein conformations using a macromolecular energy minimization procedure. , 1969, Journal of molecular biology.

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Pierre Baldi,et al.  Microarrays and Gene Expression , 2001 .

[27]  David Beeman,et al.  Some Multistep Methods for Use in Molecular Dynamics Calculations , 1976 .

[28]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[29]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[30]  James M. Bower,et al.  Computational modeling of genetic and biochemical networks , 2001 .

[31]  Peter F. Stadler,et al.  Stochastic pairwise alignments , 2002, ECCB.

[32]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  H. Scheraga,et al.  Theoretical determination of sterically allowed conformations of a polypeptide chain by a computer method , 1965 .

[35]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[36]  Bernard M. E. Moret,et al.  Fast Phylogenetic Methods For Genome Rearrangement Evolution: An Empirical Study , 2002 .

[37]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[38]  A. Fedorov,et al.  Cotranslational Protein Folding* , 1997, The Journal of Biological Chemistry.

[39]  M. Snow Powerful simulated‐annealing algorithm locates global minimum of protein‐folding potentials from multiple starting conformations , 1992 .

[40]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  K Schulten,et al.  Protein domain movements: detection of rigid domains and visualization of hinges in comparisons of atomic coordinates , 1997, Proteins.

[43]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[44]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[45]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[46]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[47]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[48]  Peter Markl [The double helix is 50]. , 2003, Wiener klinische Wochenschrift.

[49]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[51]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[52]  H. McAdams,et al.  Circuit simulation of genetic networks. , 1995, Science.

[53]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[54]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[55]  D. Lockhart,et al.  Mitotic misregulation and human aging. , 2000, Science.

[56]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1953, Nature.

[57]  Peter Adams,et al.  A simulated annealing algorithm for finding consensus sequences , 2002, Bioinform..

[58]  W. E. Hinds Department of Entomology. , 1926 .

[59]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[60]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[61]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[62]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[63]  Martin Gardner,et al.  The Last Recreations , 1997 .

[64]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[65]  Jun Zhu,et al.  Bayesian adaptive sequence alignment algorithms , 1998, Bioinform..

[66]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[67]  H. Winkler Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche , 1920 .

[68]  R. Fraser The structure of deoxyribose nucleic acid. , 2004, Journal of structural biology.

[69]  Jean-Claude Latombe,et al.  Stochastic roadmap simulation: an efficient representation and algorithm for analyzing molecular motion , 2002, RECOMB '02.

[70]  F E Cohen,et al.  Protein misfolding and prion diseases. , 1999, Journal of molecular biology.

[71]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[72]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[73]  KharHengChoo,et al.  Recent Applications of Hidden Markov Models in Computational Biology , 2004 .

[74]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[75]  A. Hagler,et al.  Chemoinformatics and Drug Discovery , 2002, Molecules : A Journal of Synthetic Chemistry and Natural Product Chemistry.

[76]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[77]  P. Bourne CASP and CAFASP experiments and their findings. , 2003, Methods of biochemical analysis.

[78]  Jie Xiong,et al.  Mutually Catalytic Branching in The Plane: Infinite Measure States , 2002 .

[79]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[80]  Stephen W. Byers,et al.  New Biology for Engineers and Computer Scientists , 2003 .

[81]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[82]  L. Cavalli-Sforza,et al.  Inference of human evolution through cladistic analysis of nuclear DNA restriction polymorphisms. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[83]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[84]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[85]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[86]  K. Sharp,et al.  Calculating the electrostatic potential of molecules in solution: Method and error assessment , 1988 .

[87]  M C Peitsch,et al.  ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. , 1996, Biochemical Society transactions.

[88]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[89]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[90]  Bin Ma,et al.  Alignment between Two Multiple Alignments , 2003, CPM.

[91]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[92]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[93]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[94]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[95]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[96]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[97]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[98]  David H. Sharp,et al.  A connectionist model of development. , 1991, Journal of theoretical biology.

[99]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[100]  S Brunak,et al.  Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. , 1992, Nucleic acids research.

[101]  Simonetta Gribaldo,et al.  The Root of the Universal Tree of Life Inferred from Anciently Duplicated Genes Encoding Components of the Protein-Targeting Machinery , 1998, Journal of Molecular Evolution.

[102]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1998, German Conference on Bioinformatics.

[103]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[104]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[105]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[106]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[107]  Ernst Haeckel,et al.  Generelle Morphologie der Organismen: Allgemeine Grundzüge der organischen Formen-Wissenschaft, mechanisch begründet durch die von Charles Darwin reformierte Descendenz-Theorie. Band 1: Allgemeine Anatomie. Band 2: Allgemeine Entwicklungsgeschichte , 1866 .

[108]  H. Scheraga,et al.  Revised algorithms for the build‐up procedure for predicting protein conformations by energy minimization , 1987 .

[109]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[110]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[111]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[112]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[113]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.