Scientific data management at the Johns Hopkins institute for data intensive engineering and science

Scientific computing has long been one of the deep and challenging applications of computer science and data management, from early endeavors in numerical simulation, to recent undertakings in the life sciences, such as genome assembly. Complex computational problems abound and their solutions transform our understanding of the physical world. The data management community’s interest in scientific applications has grown over the last decade due to the commoditization of parallelism, diminishing system administration costs, and a search for relevance beyond enterprise applications. Research in scientific computing raises non-technical challenges, such as overcoming the paucity of resources needed for experimentation, and establishing a collaborative research agenda that fosters a mutual appreciation of problems, results in a concerted effort to develop software tools, and makes all researchers successful in their respective fields. In light of this, we report on a recently formed institute at the Johns Hopkins University to further the interaction between computer science, and science and engineering. We describe ongoing projects at the institute and our collaboration experiences.

[1]  Andreas Terzis,et al.  Koala: Ultra-Low Power Data Retrieval in Wireless Sensor Networks , 2008, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[2]  Nolan Li,et al.  CasJobs and MyDB: A Batch Query Workbench , 2008, Computing in Science & Engineering.

[3]  Alexander S. Szalay,et al.  Life Under Your Feet: A Wireless Soil Ecology Sensor Network , 2008 .

[4]  Christoph Koch,et al.  DBToaster: Agile Views for a Dynamic Data Management System , 2011, CIDR.

[5]  Marcus Chang,et al.  Mote-Based Online Anomaly Detection Using Echo State Networks , 2009, DCOSS.

[6]  Parag Agrawal,et al.  Scheduling shared scans of large data files , 2008, Proc. VLDB Endow..

[7]  Randal C. Burns,et al.  Parallel Poisson Surface Reconstruction , 2009, ISVC.

[8]  Yi Li,et al.  Data exploration of turbulence simulations using a database cluster , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[9]  Xiaodan Wang,et al.  LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases , 2009, CIDR.

[10]  Marc Levoy,et al.  The digital Michelangelo project: 3D scanning of large statues , 2000, SIGGRAPH.

[11]  Alexander S. Szalay,et al.  Low-power amdahl-balanced blades for data intensive computing , 2010, OPSR.

[12]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[13]  Wei Hong,et al.  A macroscope in the redwoods , 2005, SenSys '05.

[14]  Christoph Koch,et al.  Agile Views in a Dynamic Data Management System , 2011 .

[15]  Jennie Duggan,et al.  Simultaneous Equation Systems for Query Processing on Continuous-Time Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  John A. Stankovic,et al.  LUSTER: wireless sensor network for environmental research , 2007, SenSys '07.

[17]  Randal C. Burns,et al.  Multilevel streaming for out-of-core surface reconstruction , 2007, Symposium on Geometry Processing.

[18]  Yi Li,et al.  A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence , 2008, 0804.1703.

[19]  Alexander S. Szalay,et al.  GrayWulf: Scalable Clustered Architecture for Data Intensive Computing , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[20]  Alexander S. Szalay,et al.  JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.