The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button

BackgroundThere is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new *omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed.MethodsThe MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS’ generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This ‘model-driven’ method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software.ResultsIn recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist’s satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the ‘ExtractModel’ procedure.ConclusionsThe MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.

[1]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[2]  Jack Greenfield,et al.  Software factories: assembling applications with patterns, models, frameworks and tools , 2004, OOPSLA '03.

[3]  吴树峰 从学徒到大师之路--读《 The Pragmatic Programmer, From Journeyman to Master》 , 2007 .

[4]  Damian Smedley,et al.  XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments , 2010, Genome Biology.

[5]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[6]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[7]  Gudmundur A. Thorisson,et al.  Genotype–phenotype databases: challenges and solutions for the post-genomic era , 2009, Nature Reviews Genetics.

[8]  Yang Li,et al.  designGG: an R-package and web tool for the optimal design of genetical genomics experiments , 2009, BMC Bioinformatics.

[9]  Michael Zouberakis,et al.  Solutions for data integration in functional genomics: a critical assessment and case study , 2008, Briefings Bioinform..

[10]  Scott Cain,et al.  GMODWeb: a web framework for the generic model organism database , 2008, Genome Biology.

[11]  Arie van Deursen,et al.  REPORT RAPPORT , 1997 .

[12]  Martin Fowler,et al.  Patterns of Enterprise Application Architecture , 2002 .

[13]  Norman W. Paton,et al.  Data capture in bioinformatics: requirements and experiences with Pedro , 2008, BMC Bioinformatics.

[14]  Lincoln D. Stein,et al.  Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges , 2008, Nature Reviews Genetics.

[15]  Debasis Dash,et al.  HGVbaseG2P: a central genetic association database , 2008, Nucleic Acids Res..

[16]  Morris A. Swertz,et al.  Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases , 2004, Bioinform..

[17]  T. N. Bhat,et al.  A framework for scientific data modeling and automated software development , 2005, Bioinform..

[18]  James Milne Neighbors,et al.  Software construction using components , 1980 .

[19]  P. Kidwell,et al.  The mythical man-month: Essays on software engineering , 1996, IEEE Annals of the History of Computing.

[20]  Päivi Rosenström,et al.  NordicDB: a Nordic pool and portal for genome-wide control data , 2010, European Journal of Human Genetics.

[21]  John M. Hancock,et al.  Open Bioinformatics Foundation (OBF) , 2004 .

[22]  Jon Louis Bentley,et al.  Programming pearls: little languages , 1986, CACM.

[23]  Morris A. Swertz,et al.  Beyond standardization: dynamic software infrastructures for systems biology , 2007, Nature Reviews Genetics.

[24]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[25]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.