Open source tools and toolkits for bioinformatics: significance, and where are we?

This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.

[1]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[2]  Bernard Manderick,et al.  PDB file parser and structure class implemented in Python , 2003, Bioinform..

[3]  Korbinian Strimmer,et al.  PAL: an object-oriented programming library for molecular evolution and phylogenetics , 2001, Bioinform..

[4]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[5]  Peter Dudek,et al.  TROD: T7 RNAi Oligo Designer , 2004, Nucleic Acids Res..

[6]  Lincoln Stein,et al.  Synbrowse: a Synteny Browser for Comparative Sequence Analysis , 2022 .

[7]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[8]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[10]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[11]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[12]  R Waterston,et al.  The Human Genome Project: Reaching the Finish Line , 1998, Science.

[13]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[14]  Mark Yandell,et al.  A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Salzberg,et al.  Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura , 2004, Genome Biology.

[16]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[17]  Nevin D. Young,et al.  OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies , 2003, BMC Bioinformatics.

[18]  Michiel J. L. de Hoon,et al.  Bioinformatics and Computational Biology with Biopython , 2003 .

[19]  Mark A. Williams,et al.  The Bioinformatics Template Library—generic Components for Biocomputing , 2001 .

[20]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[21]  Tim J. Carver,et al.  The design of Jemboss: a graphical user interface to EMBOSS , 2003, Bioinform..

[22]  Thomas Horn,et al.  E-RNAi: a web application to design optimized RNAi constructs , 2005, Nucleic Acids Res..

[23]  Kevin Thornton,et al.  libsequence: a C++ class library for evolutionary genetic analysis , 2003, Bioinform..

[24]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[25]  Michael Travers,et al.  BioLingua: a programmable knowledge environment for biologists , 2005, Bioinform..

[26]  A. Adai,et al.  Computational prediction of miRNAs in Arabidopsis thaliana. , 2005, Genome research.

[27]  Paramvir S. Dehal,et al.  Mechanisms of thermal adaptation revealed from the genomes of the Antarctic Archaea Methanogenium frigidum and Methanococcoides burtonii. , 2003, Genome research.

[28]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[29]  Eric W. Klee,et al.  AMOD: a morpholino oligonucleotide selection tool , 2005, Nucleic Acids Res..

[30]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[31]  T. Mitchell-Olds Faculty Opinions recommendation of Population genetic and phylogenetic evidence for positive selection on regulatory mutations at the factor VII locus in humans. , 2005 .

[32]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[33]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[34]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[35]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[36]  Catherine Letondal,et al.  A Web interface generator for molecular biology programs in Unix , 2001, Bioinform..

[37]  Matthew R. Pocock,et al.  BioJava: open source components for bioinformatics , 2000, SIGB.

[38]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[39]  David A. Nix,et al.  GATA: a graphic alignment tool for comparative sequence analysis , 2005, BMC Bioinformatics.

[40]  Wyeth W. Wasserman,et al.  TFBS: Computational framework for transcription factor binding site analysis , 2002, Bioinform..

[41]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[42]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[43]  Jason E Stajich,et al.  Disentangling the effects of demography and selection in human history. , 2004, Molecular biology and evolution.

[44]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[45]  E. Birney,et al.  The Ensembl core software libraries. , 2004, Genome research.

[46]  Scott Gustafson,et al.  caCORE: A common infrastructure for cancer informatics , 2003, Bioinform..

[47]  Elizabeth A. Kellogg,et al.  Primaclade - a flexible tool to find conserved PCR primers across multiple species , 2005, Bioinform..

[48]  Bradley I. Coleman,et al.  An intermediate grade of finished genomic sequence suitable for comparative analyses. , 2004, Genome research.

[49]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[50]  Shawn Hoon,et al.  Biopipe: a flexible framework for protocol-based bioinformatics analysis. , 2003, Genome research.

[51]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[52]  Jacob D. Jaffe,et al.  The complete genome and proteome of Mycoplasma mobile. , 2004, Genome research.

[53]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[54]  S. Cannon,et al.  DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization , 2003, Genome Biology.

[55]  Heiko Schoof,et al.  BioMOBY Successfully Integrates Distributed Heterogeneous Bioinformatics Web Services. The PlaNet Exemplar Case1 , 2005, Plant Physiology.

[56]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[57]  Tao Xu,et al.  Pegasys: software for executing and integrating analyses of biological sequences , 2004, BMC Bioinformatics.

[58]  Wei Zhao,et al.  Gramene: a resource for comparative grass genomics , 2002, Nucleic Acids Res..

[59]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[60]  Kimberly Van Auken,et al.  WormBase: a comprehensive data resource for Caenorhabditis biology and genomics , 2004, Nucleic Acids Res..

[61]  Monica C. Sleumer,et al.  Sockeye: a 3D environment for comparative genomics. , 2004, Genome research.

[62]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[63]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[64]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[65]  D. M. Krylov,et al.  Comparative analysis of complete genomes reveals gene loss, acquisition and acceleration of evolutionary rates in Metazoa, suggests a prevalence of evolution via gene acquisition and indicates that the evolutionary rates in animals tend to be conserved. , 2004, Nucleic acids research.