V-Lab-Protein: Virtual Collaborative Lab for protein sequence analysis

Recent development of genome and gene analysis technology enabled rapid accumulation of biological data. To utilize such huge data, a biologist needs to have resource-rich computing environment and user-friendly analysis tool invocation. To response such requirements, we designed and implemented a virtual lab, named Virtual Collaborative Lab (V-Lab-Protein), using an efficient and flexible computing resource management and workflow engine with a user-friendly graphical workflow composer. Utility of our system is demonstrated by analyzing sample protein sequence sets. This is the first system of its kind that combines flexible workflow systems and on-demand compute and data resources (Amazon EC2/S3 in this case). We believe that this system design principle will be a new and effective paradigm for small biology research labs to handle the ever-increasing biological data.

[1]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[2]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[3]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[4]  Vasa Curcin,et al.  KDE Bioscience: Platform for bioinformatics analysis workflows , 2005, Journal of Biomedical Informatics.

[5]  Omran A. Bukhres,et al.  A Dynamic Workflow Approach for the Integration of Bioinformatics Services , 2005, Cluster Computing.

[6]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[7]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[8]  Ian J. Taylor,et al.  Visual Grid Workflow in Triana , 2005, Journal of Grid Computing.

[9]  M. Ronaghi,et al.  A Sequencing Method Based on Real-Time Pyrophosphate , 1998, Science.

[10]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[13]  Lavanya Ramakrishnan,et al.  Grid portals for bioinformatics , 2006 .

[14]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[15]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[16]  Yi Huang,et al.  Building web services for scientific grid applications , 2006, IBM J. Res. Dev..

[17]  Anthony K. H. Tung,et al.  ARCS: an aggregated related column scoring scheme for aligned sequences , 2006, Bioinform..