Grid Based Virtual Bioinformatics Laboratory

Abstract Biotechnologies such as genomics, gene sequencing and high-throughput screening are creating massive volumes and multiple sources of biological and chemical data. However, the volumes of data and the processing power required to analyse it, is threatening to create a bottleneck that might hamper the growth of biotechnology itself. To date, the HPC resources required to store, manage and analyse such volumes of data has been only at the disposal of large companies and research institutes. However, with the emergence of Grid Technology, the whole area of bioinformatics is an ideal candidate to leverage the benefits of secure, reliable and scaleable high bandwidth access to distributed data sources across various administrative domains. This in effect will allow geographically remote researchers with limited internal resources, access to a wealth of biological datasets and HPC resources. This paper presents from an industrial perspective the business drivers that acted as the catalyst in creating the industrial e-Science project GeneGrid. The Architecture and roadmap for a Grid based Virtual Bioinformatics Laboratory will be presented. 1 Introduction Whole genome expression monitoring will have extraordinary impact on clinical diagnosis and therapy and bring new power to clinical medicine. As the field progresses we will identify new probes for cancer, infectious disease and inherited disease and understand how genetic damage occurs and how genes alter response to drug therapies. Equally important will be new therapeutic tools in the form of recombinant gene products, novel drug targets, rational drug design, and gene therapy. Next-generation efforts will allow us to link gene expression patterns with formal characteristics of disease models including pathological and clinical descriptions. It has been more than a year since the human genome was mapped, considered one of the most gargantuan scientific endeavors ever undertaken. The DNA-sequencing data from the human genome project (HGP) contains much untapped