Open DMIX: High Performance Web Services for Distributed Data Mining

In this note, we introduce Open DMIX, an open source collection of web services for the mining, integration, and exploration of remote and distributed data. We also describe some preliminary experimental results using Open DMIX. Open DMIX is layered: the top layer provides templated data mining and statistical algorithms, such as those defined by the Predictive Model Markup Language [8]. The middle layer provides access and integration of remote and distributed data using the DataSpace Transfer Protocol (DSTP) [6]. The bottom layer provides specialized network protocols designed to work with large distributed data sets over wide area networks, which may have high bandwidth delay products (BDPs). Open DMIX clients interact with Open DMIX servers using a version of web services designed for high performance applications, which we call SOAP+. Robert Grossman is also with Open Data Partners.