Compiling Data Intensive Applications with Spatial Coordinates

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for efficient execution on cluster of workstations or distributed memory machines. In this paper, we focus on data intensive applications with two important properties: 1) data elements have spatial coordinates associated with them and the distribution of the data is not regular with respect to these coordinates, and 2) the application processes only a subset of the available data on the basis of spatial coordinates. These applications arise in many domains like satellite data-processing and medical imaging. We present a general compilation and execution strategy for this class of applications which achieves high locality in disk accesses. We then present a technique for hoisting conditionals which further improves efficiency in execution of such compiled codes. Our preliminary experimental results showtha t the performance from our proposed execution strategy is nearly two orders of magnitude better than a naive strategy. Further, up to 30% improvement in performance is observed by applying the technique for hoisting conditionals.

[1]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[2]  David B. Whalley,et al.  Avoiding conditional branches by code replication , 1995, PLDI '95.

[3]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[4]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[5]  Joel H. Saltz,et al.  Scheduling in a High Performance Remote-Sensing Data Server , 1997, PPSC.

[6]  Joel H. Saltz,et al.  Language Extensions and Compilation Techniques for Data Intensive Computations , 2000 .

[7]  Joel H. Saltz,et al.  T2: a customizable parallel database for multi-dimensional data , 1998, SGMD.

[8]  Dennis Gannon,et al.  Distributed pC++ Basic Ideas for an Object Parallel Language , 1993, Sci. Program..

[9]  Rudolf Eigenmann,et al.  Demand-Driven, Symbolic Range Propagation , 1995, LCPC.

[10]  Rajiv Gupta,et al.  Interprocedural conditional branch elimination , 1997, PLDI '97.

[11]  Ken Kennedy,et al.  Compiler support for out-of-core arrays on parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[12]  Andrew A. Chien,et al.  Concurrent aggregates (CA) , 1990, PPOPP '90.

[13]  Mahmut T. Kandemir,et al.  Improving the performance of out-of-core computations , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[14]  Joel H. Saltz,et al.  Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data , 1999, IPPS/SPDP.

[15]  David A. Padua,et al.  Gated SSA-based demand-driven symbolic analysis for parallelizing compilers , 1995, ICS '95.

[16]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[17]  Ken Kennedy,et al.  A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.

[18]  Joel H. Saltz,et al.  Digital dynamic telepathology-the Virtual Microscope , 1998, AMIA.

[19]  Andrew A. Chien,et al.  Precise Concrete Type Inference for Object-Oriented Languages , 1994, OOPSLA.

[20]  Joel H. Saltz,et al.  Coupling Multiple Simulations via a High Performance Customizable Database System , 1999, PPSC.

[21]  Joel H. Saltz,et al.  High Level Programming Methodologies for Data Intensive Computations , 2000, LCR.

[22]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[23]  Joel H. Saltz,et al.  Compiling object-oriented data intensive applications , 2000, ICS '00.

[24]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.