Expanding the Knowledge Economy : Issues , Applications , Case Studies

The complete cascade from genome, proteome, metabolome, and physiome, to health forms multiscale, multiscience systems and crosses many orders of magnitude in temporal and spatial scales. The interactions between these systems create exquisite multitiered networks, with each component in nonlinear contact with many interaction partners. Understanding, quantifying, and handling this complexity is one of the biggest scientific challenges of our time. In this paper we argue that computer science in general, and Grid computing in particular, provide the language needed to study and understand these systems, and discuss a case study in decision support for HIV drug resistance treatment within the European ViroLab project. 1. From Molecule To Man ‘During the next decade, the practice of medicine will change dramatically, through personalized, targeted treatments that will enable a move beyond prevention to pre-emptive strategies.’ [1] Humans are complex systems: from a biological cell made of thousands of different molecules that work together, to billions of cells that build our tissue, organs and systems, to our society, six billion unique interacting individuals. Such complex systems are not made of identical and undistinguishable components: rather each gene in a cell, each cell in the immune system, and each individual have their own characteristic behavior and provide unique value and contributions to the systems in which they are constituents. The complete cascade from the genome, proteome, metabolome, physiome to health constitutes multiscale, multi-science systems, and crosses many orders of magnitude in temporal and spatial scales [2], as seen in Figure 1. The interactions between these systems form exquisite multitiered networks, each component being in non-linear contact with many selected interaction partners. These networks are not just complicated; they are complex. Understanding, quantifying and handling this complexity is one of the biggest scientific challenges of our time [3]. It is the central assertion of this paper that computer science is the language to study and understand these systems, and that the same laws and organizing principles that dictate biomedical systems are reflected in the architecture of simulating computer systems. We discuss some of these laws and organizing principles required to build systems for individualized biomedicine that can account for variations in physiology, treatment and drug response. Copyright © 2007 The Authors Figure 1. Multi-scale, multi-science models and techniques are needed to cover the huge spatial and temporal scales in studying complex problems such as drug response in infectious diseases. 1.1 Pushing and Pulling We observe an application pull from biomedicine that is changing the scientific paradigm to emphasis for in silico studies, where more and more details of biomedical processes are simulated in addition to in vivo and in vitro studies. These simulated processes are being used to support medical doctors in making decisions through exploration of different scenarios. Typical examples are pre-operative simulation and visualization of vascular surgery [4] and expert systems for drug ranking [5]. At the same time we observe a technology push from computing and large amounts of data availability [6]. In the field of high-performance computing there have been changes from sequential to parallel to distributed computing, where the ‘killer applications’ moved from mathematics to physics to chemistry to biology to medicine, thus increasing the complexity of the systems under study with the complexity of the computational systems being required. In addition, with the advances in Internet technology and Grid computing [7], huge amounts of data from sensors, experiments and simulations have become available. There are, however, significant computational, integration, collaboration, and interaction gaps between these observed application pull and technology push that need to be addressed. 1.2 Bridging the Gaps In order to close the computational gap in systems biology, we need to construct, integrate and manage a plethora of models. A bottom-up data-driven approach will not work for this. Web and Grid services are needed to integrate often incompatible applications and tools for data acquisition, registration, storage, provenance, organization, analysis and presentation, thus bridging the integration gap. Even if we manage to solve the computational and integration challenges, we still need a system-level approach to share processes, data, information and knowledge across geographic and organizational boundaries within the context of distributed, multidisciplinary and multi-organizational collaborative teams, or ‘virtual organizations’ as they are often called, thus closing the collaboration and interaction gap. Finally, we need intuitive methods to streamline all these processes dynamically depending on their availability, reliability and the specific interests of the end-users (medical doctors, surgeons, clinical experts, and researchers). Such methods can be captured into 'scientific workflows' in which the flow of data and control from one step to Copyright © 2007 The Authors another is expressed in a workflow language [8,9]. A general scheme for conducting such type of e-Science research is depicted in Figure 2. Figure 2. General architecture for conducting e-Science research: information systems integrate available data with data from specialized instruments and sensors into distributed repositories. Computational models are then executed using the integrated data, providing large quantities of model output data, which is mined and processed in order to extract useful knowledge. We discuss the development of a Grid based decision support system consisting of modules such as the one depicted in Figure 2, for individualized drug ranking in Human Immunodeficiency Virus (HIV-1) diseases, called ViroLab [10]. The reason for using this complex problem of HIV drug resistance as a prototype for our system-level approach is twofold. First of all, HIV drug resistance is becoming an increasing problem worldwide, with a considerable number of HIV infected patients developing failure of complete suppression of the virus despite combination therapy with antiretroviral drugs. Second, HIV drug resistance is one of the few areas in medicine where genetic information is widely available and used for a considerable number of years. As a consequence, large numbers of complex genetic sequences are available, in addition to clinical data. 2. ViroLab: Collaborative Decision Support System in Viral Disease Treatment ‘A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it’ (First Law of Mentat), Frank Herbert, Dune.

[1]  W. Daniel Hillis,et al.  New computer architectures and their relationship to physics or why computer science is no good , 1982 .

[2]  Malone Bl Health care in the 21st century. , 1996 .

[3]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[4]  Peter M. A. Sloot,et al.  Cellular Automata Model of Drug Therapy for HIV Infection , 2002, ACRI.

[5]  S. Deeks,et al.  Treatment of antiretroviral-drug-resistant HIV-1 infection , 2003, The Lancet.

[6]  Marian Bubak,et al.  Architecture of the Grid for Interactive Applications , 2003, International Conference on Computational Science.

[7]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[8]  Bertram Ludäscher,et al.  A Framework for the Design and Reuse of Grid Workflows , 2004, SAG.

[9]  James Hetherington,et al.  Computational challenges of systems biology , 2004, Computer.

[10]  Frank Leymann,et al.  Modeling Stateful Resources with Web Services , 2004 .

[11]  Marian Bubak,et al.  An integrative approach to high-performance biomedical problem solving environments on the Grid , 2004, Parallel Comput..

[12]  Marian Bubak,et al.  VIROLAB - A virtual Laboratory for Decision Support in Viral Diseases Treatment , 2005 .

[13]  Tulio de Oliveira,et al.  An automated genotyping system for analysis of HIV-1 and other microbial sequences , 2005, Bioinform..

[14]  Thomas L. Casavant,et al.  Gene transcript clustering: a comparison of parallel approaches , 2005, Future Gener. Comput. Syst..

[15]  Katarzyna Rycerz,et al.  Workflow composer and service registry for grid applications , 2005, Future Gener. Comput. Syst..

[16]  A. Barabasi,et al.  Taming complexity , 2005 .

[17]  Peter M. A. Sloot,et al.  A Grid-Based Hiv Expert System , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[18]  Marian Bubak,et al.  From molecule to man: Decision support in individualized E-health , 2006, Computer.

[19]  Peter V. Coveney,et al.  Grid Assisted Ensemble Molecular Dynamics Simulations of HIV-1 Proteases Reveal Novel Conformations of the Inhibitor Saquinavir , 2006, CompLife.

[20]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..