A brief history of bioinformatics

It is easy for today's students and researchers to believe that modern bioinformatics emerged recently to assist next-generation sequencing data analysis. However, the very beginnings of bioinformatics occurred more than 50 years ago, when desktop computers were still a hypothesis and DNA could not yet be sequenced. The foundations of bioinformatics were laid in the early 1960s with the application of computational methods to protein sequence analysis (notably, de novo sequence assembly, biological sequence databases and substitution models). Later on, DNA analysis also emerged due to parallel advances in (i) molecular biology methods, which allowed easier manipulation of DNA, as well as its sequencing, and (ii) computer science, which saw the rise of increasingly miniaturized and more powerful computers, as well as novel software better suited to handle bioinformatics tasks. In the 1990s through the 2000s, major improvements in sequencing technology, along with reduced costs, gave rise to an exponential increase of data. The arrival of 'Big Data' has laid out new challenges in terms of data mining and management, calling for more expertise from computer science into the field. Coupled with an ever-increasing amount of bioinformatics tools, biological Big Data had (and continues to have) profound implications on the predictive power and reproducibility of bioinformatics results. To overcome this issue, universities are now fully integrating this discipline into the curriculum of biology students. Recent subdisciplines such as synthetic biology, systems biology and whole-cell modeling have emerged from the ever-increasing complementarity between computer science and biology.

[1]  Yanhui Hu,et al.  FlyBase at 25: looking to the future , 2016, Nucleic Acids Res..

[2]  Marko Pfeifer,et al.  An Introduction To Genetic Analysis , 2016 .

[3]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[4]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[5]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[6]  A one-letter notation for amino acid sequences. , 1972, Pure and applied chemistry. Chimie pure et appliquee.

[7]  R S LEDLEY,et al.  Digital electronic computers in biomedical science. , 1959, Science.

[8]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[9]  Luís C. Lamb,et al.  Three-dimensional protein structure prediction: Methods and computational strategies , 2014, Comput. Biol. Chem..

[10]  Reinhard Schneider,et al.  How Not to Be a Bioinformatician , 2012, Source Code for Biology and Medicine.

[11]  F. Sanger,et al.  The amino-acid sequence in the glycyl chain of insulin. II. The investigation of peptides from enzymic hydrolysates. , 1951, The Biochemical journal.

[12]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[13]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[14]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[15]  E. Chargaff,et al.  Distribution density of nucleotides within a desoxyribonucleic acid chain. , 1953, The Journal of biological chemistry.

[16]  Brian Shackel,et al.  The BLEND System: Programme for the Study of Some 'Electronic Journals' , 1982, Comput. J..

[17]  Russell Schwartz,et al.  Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies , 2014, PLoS Comput. Biol..

[18]  J. Haber,et al.  An evaluation of the relatedness of proteins based on comparison of amino acid sequences. , 1970, Journal of molecular biology.

[19]  K. Mullis,et al.  Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. , 1987, Methods in enzymology.

[20]  M. Jaskólski,et al.  A brief history of macromolecular crystallography, illustrated by a family tree and its Nobel fruits , 2014, The FEBS journal.

[21]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[22]  Joseph A. November Early Biomedical Computing and the Roots of Evidence-Based Medicine , 2011, IEEE Annals of the History of Computing.

[23]  Kirsten K. Deane-Coe,et al.  Student Performance along Axes of Scenario Novelty and Complexity in Introductory Biology: Lessons from a Unique Factorial Approach to Assessment , 2017, CBE life sciences education.

[24]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[25]  Brian Shackel,et al.  The BLEND system Programme for the study of some ‘electronic journals’∗ , 1982 .

[26]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[27]  Fabian Sievers,et al.  Clustal Omega, accurate alignment of very large numbers of sequences. , 2014, Methods in molecular biology.

[28]  Antony T. Vincent,et al.  Who qualifies to be a bioinformatician? , 2015, Front. Genet..

[29]  Eric S. Lander,et al.  On the sequencing of the human genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[31]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[32]  F. Sanger,et al.  The amino-acid sequence in the phenylalanyl chain of insulin. I. The identification of lower peptides from partial hydrolysates. , 1951, The Biochemical journal.

[33]  Joel B. Hagen,et al.  The origins of bioinformatics , 2000, Nature Reviews Genetics.

[34]  Mathieu Fourment,et al.  A comparison of common programming languages used in bioinformatics , 2008, BMC Bioinformatics.

[35]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[36]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[37]  Annelise E Barron,et al.  Advantages and limitations of next‐generation sequencing technologies: A comparison of electrophoresis and non‐electrophoresis methods , 2008, Electrophoresis.

[38]  Thomas Wetter,et al.  Genome Sequence Assembly Using Trace Signals and Additional Sequence Information , 1999, German Conference on Bioinformatics.

[39]  Giuseppe De Giacomo History of Programming Languages , 2006 .

[40]  Chris Sander,et al.  GeneQuiz: A Workbench for Sequence Analysis , 1994, ISMB.

[41]  Ipseeta Satpathy,et al.  Innovation: The survival mantra for gramya banks (an empirical analysis of innovative initiatives of Gramya banks in Odisha) , 2011, BIOINFORMATICS 2011.

[42]  C. Lemieux,et al.  Implementing a web‐based introductory bioinformatics course for non‐bioinformaticians that incorporates practical exercises , 2018, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[43]  Edson Cáceres,et al.  Phylogenetic Distance Computation Using CUDA , 2012, BSB.

[44]  P. Edman,et al.  A method for the determination of amino acid sequence in peptides. , 1949, Archives of biochemistry.

[45]  P. Pevzner,et al.  Computing Has Changed Biology—Biology Education Must Catch Up , 2009, Science.

[46]  L. Pauling,et al.  Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds: Two New Pleated Sheets. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[47]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[49]  Alaina G. Levine An explosion of bioinformatics careers , 2014 .

[50]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[51]  L. Kedes,et al.  Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Ina Koch,et al.  A review of bioinformatics education in Germany , 2008, Briefings Bioinform..

[53]  Jill C. Rubinstein Perspectives on an Education in Computational Biology and Medicine , 2012, The Yale journal of biology and medicine.

[54]  J. Kendrew,et al.  A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis , 1958, Nature.

[55]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[56]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[57]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[58]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[59]  Glyn Moody Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine, and Business , 2004 .

[60]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[61]  Sam Williams,et al.  Free as in Freedom: Richard Stallman's Crusade for Free Software , 2002 .

[62]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[63]  Josep Ramón Goñi,et al.  Molecular dynamics simulations: advances and applications , 2015, Advances and applications in bioinformatics and chemistry : AABC.

[64]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[65]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[66]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[67]  George E. Kimball,et al.  Punched Card Calculation of Resonance Energies , 1949 .

[68]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[70]  Renzo Kottmann,et al.  Genomic Standards Consortium Projects , 2014, Standards in genomic sciences.

[71]  Brent S. Pedersen,et al.  BioStar: An Online Question & Answer Resource for the Bioinformatics Community , 2011, PLoS Comput. Biol..

[72]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[73]  H. Khorana,et al.  Studies on polynucleotides. XCVI. Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. , 1971, Journal of molecular biology.

[74]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[75]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[76]  Cliff McKnight Electronic journals — past, present … and future? , 1993 .

[77]  Benny Shanon,et al.  The genetic code and human language , 1978, Synthese.

[78]  Jie Liang,et al.  Computational Methods for Protein Structure Prediction and Modeling , 2007 .

[79]  L. Eichinger,et al.  Dictyostelium discoideum Protocols , 2006, Methods in Molecular Biology.

[80]  Muhammad Akram,et al.  Text Book of Bioinformatics , 2011 .

[81]  F. H. C. CRICK,et al.  Origin of the Genetic Code , 1967, Nature.

[82]  Berk Ekmekci,et al.  An Introduction to Programming for Bioscientists: A Python-Based Primer , 2016, PLoS Comput. Biol..

[83]  E. D. Hyman A new method of sequencing DNA. , 1988, Analytical biochemistry.

[84]  M. Nirenberg,et al.  RNA Codewords and Protein Synthesis , 1964, Science.

[85]  Ernst Haeckel Generelle morphologie der organismen. Allgemeine grundzüge der organischen formen-wissenschaft, mechanisch begründet durch die von Charles Darwin reformirte descendenztheorie, von Ernst Haeckel , 1866 .

[86]  J. Wesley Leas Proceedings of the December 4-6, 1962, fall joint computer conference , 1962 .

[87]  O. Avery,et al.  STUDIES ON THE CHEMICAL NATURE OF THE SUBSTANCE INDUCING TRANSFORMATION OF PNEUMOCOCCAL TYPES , 1944, The Journal of experimental medicine.

[88]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[89]  David Roy Smith,et al.  Broadening the definition of a bioinformatician , 2015, Front. Genet..

[90]  Diwakar Shukla,et al.  To milliseconds and beyond: challenges in the simulation of protein folding. , 2013, Current opinion in structural biology.

[91]  Eugene W Myers,et al.  The independence of our genome assemblies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[93]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[94]  P. Fey,et al.  One stop shop for everything Dictyostelium: dictyBase and the Dicty Stock Center in 2012. , 2013, Methods in molecular biology.

[95]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[96]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[97]  Ziheng Yang,et al.  A biologist’s guide to Bayesian phylogenetic analysis , 2017, Nature Ecology & Evolution.

[98]  F. Sanger,et al.  The amino-acid sequence in the glycyl chain of insulin. I. The identification of lower peptides from partial hydrolysates. , 1953, The Biochemical journal.

[99]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[100]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[101]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[102]  B. Malthiery,et al.  Apple II PASCAL programs for molecular biologists , 1984, Nucleic Acids Res..

[103]  S. E. Stolov,et al.  Internet: a computer support tool for building the human genome , 1995, Proceedings of Electro/International 1995.

[104]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[105]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[106]  Ken McKenzie A structured approach to microcomputer system design , 1976 .

[107]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[108]  P. Edman,et al.  A protein sequenator. , 1967, European journal of biochemistry.

[109]  A. Griffiths Introduction to Genetic Analysis , 1976 .

[110]  Robert S. Ledley,et al.  Comprotein: a computer program to aid primary protein structure determination , 1962, AFIPS '62 (Fall).

[111]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[112]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[113]  N. Dovichi DNA sequencing by capillary electrophoresis , 1997, Electrophoresis.

[114]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[115]  Yixue Li,et al.  Big Biological Data: Challenges and Opportunities , 2014, Genom. Proteom. Bioinform..

[116]  M. Singer,et al.  Summary statement of the Asilomar conference on recombinant DNA molecules. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[117]  S. Pinker,et al.  Natural language and natural selection , 1990, Behavioral and Brain Sciences.

[118]  Woonghee Lee,et al.  Structural proteomics by NMR spectroscopy , 2008, Expert review of proteomics.

[119]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[120]  Tim J. Carver,et al.  The design of Jemboss: a graphical user interface to EMBOSS , 2003, Bioinform..

[121]  B. Karger,et al.  DNA sequencing by CE , 2009, Electrophoresis.

[122]  Linus Pauling,et al.  Chemical Paleogenetics. Molecular "Restoration Studies" of Extinct Forms of Life. , 1963 .

[123]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[124]  Rolf Apweiler,et al.  Swissknife - 'lazy parsing' of SWISS-PROT entries , 1999, Bioinform..

[125]  R. Staden A strategy of DNA sequencing employing computer programs. , 1979, Nucleic acids research.

[126]  G. H. Hamm,et al.  The EMBL data library , 1993, Nucleic Acids Res..

[127]  Robert Schmieder,et al.  SEQanswers: an open access community for collaboratively decoding genomes , 2012, Bioinform..

[128]  Jeffrey Chang,et al.  Biopython: Python tools for computational biology , 2000, SIGB.

[129]  Nathan Goodman,et al.  The LabBase system for data management in large scale biology research laboratories , 1998, Bioinform..

[130]  Lisa A. Best,et al.  Scientific Graphs and the Hierarchy of the Sciences: , 2000 .

[131]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[132]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[133]  Randy J. Read,et al.  Overview of the CCP4 suite and current developments , 2011, Acta crystallographica. Section D, Biological crystallography.

[134]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[135]  A. D. Hershey,et al.  INDEPENDENT FUNCTIONS OF VIRAL PROTEIN AND NUCLEIC ACID IN GROWTH OF BACTERIOPHAGE , 1952, The Journal of general physiology.

[136]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[137]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[138]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[139]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[140]  Ohio Library,et al.  Programming Languages , 2013, Lecture Notes in Computer Science.

[141]  M. Johnsen,et al.  JINN, an integrated software package for molecular geneticists , 1984, Nucleic Acids Res..