Distributed Computing with Data: A CORBA-Based Approach

Statistical computing is part of a more general process, which can be called computing with data. Besides traditional statistical analysis, this involves acquiring, organizing, and visualizing data, often in large, structured datasets organized in database management systems and used for purposes beyond analysis. An important challenge for statistical computing (and statistics in general) is to increase the scope of our involvement in this diverse environment. At the same time, the computing environment itself is becoming more diverse in all respects: data and users are widely spread and using many different systems. We describe research looking towards the next generation of software for such applications, centered on the idea of distributed computing with data. By this we mean distributed in two fundamentally different, but related, senses. First, the data and the tasks users apply to the data are distributed geographically, over a heterogeneous network of computers and operating systems. Second, the programming environment we envision is distributed over a variety of languages and other software. We describe research towards a programming environment suitable for distributed computing with data. As a key to this environment, we propose to take advantage of the CORBA standard for distributed, object-oriented computation. This paper describes the background for our approach, the reasoning for the CORBA proposal, and some initial experiments in the new approach.