A composable data management architecture for scientific applications

With ever increasing computational power, data management has become crucial for scientific applications today. Most on-going research efforts are dedicated to generalizing universal requirements and schemas for building generic data management systems. By closely studying a broad range of scientific applications including X-ray crystallography, radiation therapy, automated photometry and comparative genomics, a composable data management architecture for scientific applications is presented, which instead aims at providing a set of orthogonal components for each scientific application to quickly and easily construct its customized data management system. The prototype architecture is described in detail, and the component interfaces are defined in SIDL (Scientific Interface Definition Language). Results of building customized data management systems for a variety of scientific applications are also discussed.

[1]  Gregor von Laszewski,et al.  A Collaborative Informatics Infrastructure for Multi-Scale Science , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[2]  C. Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  J. Stillerman,et al.  THE NATIONAL FUSION COLLABORATORY PROJECT: APPLYING GRID TECHNOLOGY FOR MAGNETIC FUSION RESEARCH , 2004 .

[4]  David L. Wheeler,et al.  GenBank: update , 2004, Nucleic Acids Res..

[5]  James D. Myers,et al.  Re-integrating the research record , 2003, Comput. Sci. Eng..

[6]  George W. Turner,et al.  Architecture of the Software for the Indiana CCD Automated Telescope , 1992 .

[7]  D. Wells,et al.  Fits: a flexible image transport system , 1981 .

[8]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[9]  Arie Shoshani,et al.  The Earth System Grid Discovery and Semantic Web Technologies , 2003 .

[10]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[11]  John L. Pfaltz,et al.  Scalable, parallel, scientific databases , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[12]  T E Schultheiss,et al.  A comparison of daily CT localization to a daily ultrasound-based system in prostate cancer. , 1999, International journal of radiation oncology, biology, physics.

[13]  Carl Kesselman,et al.  Grid-based metadata services , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[14]  Malcolm Atkinson,et al.  Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM’04) , 2004 .

[15]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Scott R. Kohn,et al.  Divorcing Language Dependencies from a Scientific Software Library , 2001, PPSC.

[17]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[18]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[19]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[20]  Arie Shoshani,et al.  DataMover: robust terabyte-scale multi-file replication over wide-area networks , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[21]  Arie Shoshani,et al.  Data Access, Integration, and Management , 2004, The Grid 2, 2nd Edition.

[22]  H Hulen,et al.  Storage Area Networks and The High Performance Storage System , 2002 .

[23]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[24]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[25]  Donald F. McMullen,et al.  Instruments and Sensors as Network Services : Making Instruments First Class Members of the Grid , 2003 .

[26]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[27]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[28]  Yu Ma,et al.  PLATCOM: a Platform for Computational Comparative Genomics , 2005, Bioinform..

[29]  B. A. Tague,et al.  UNIX time-sharing system: Foreword , 1978, The Bell System Technical Journal.

[30]  Colin G. Thurston LIMS/instrument integration computing architecture for improved automation and flexibility , 2004 .

[31]  Peter Z. Kunszt,et al.  Data Mining the SDSS SkyServer Database , 2002, WDAS.