Data mining middleware for wide-area high-performance networks

In this paper, we describe two distributed, data intensive applications that were demonstrated at iGrid 2005 (iGrid Demonstration US 109 and iGrid Demonstration US121). One involves transporting astronomical data from the Sloan Digital Sky Survey (SDSS) and the other involves computing histograms from multiple high-volume data streams. Both rely on newly developed data transport and data mining middleware. Specifically, we describe a new version of the UDT network protocol called Composible-UDT, a file transfer utility based upon UDT called UDT-Gateway, and an application for building histograms on high-volume data flows called BESH (for Best Effort Streaming Histogram). For both demonstrations, we include a summary of the experimental studies performed at iGrid 2005.

[1]  Robert L. Grossman,et al.  TeraScope: distributed visual data mining of terascale data sets over photonic networks , 2003, Future Gener. Comput. Syst..

[2]  William E. Allcock,et al.  Grid-enabled particle physics event analysis: experiences using a 10 Gb, high-latency network for a high-energy physics application , 2003, Future Gener. Comput. Syst..

[3]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[4]  Robert L. Grossman,et al.  Supporting Configurable Congestion Control in Data Transport Services , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  S. Ha,et al.  A Step toward Realistic Performance Evaluation of High-Speed TCP Variants , 2006 .

[6]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[7]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[8]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[9]  Robert L. Grossman,et al.  The Photonic TeraStream: enabling next generation applications through intelligent optical networking at iGRID2002 , 2003, Future Gener. Comput. Syst..

[10]  Cees T. A. M. de Laat,et al.  The rationale of the current optical networking initiatives , 2003, Future Gener. Comput. Syst..

[11]  Yunhong Gu A Survey of Transport Protocols other than Standard TCP , 2005 .

[12]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[13]  Robert L. Grossman,et al.  SABUL: A Transport Protocol for Grid Computing , 2003, Journal of Grid Computing.

[14]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[15]  Robert L. Grossman,et al.  Optimizing UDP-based Protocol Implementations , 2005 .

[16]  Eric He,et al.  A Survey of Transport Protocols other than Standard TCP , 2005 .

[17]  Filippo Furfaro,et al.  Hierarchical binary histograms for summarizing multi-dimensional data , 2005, SAC '05.

[18]  Jason Lee,et al.  Microscopic examination of TCP flows over transatlantic links , 2003, Future Gener. Comput. Syst..

[19]  Raghu Ramakrishnan,et al.  Dynamic Histograms: Capturing Evolving Data Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Robert L. Grossman,et al.  Experimental studies using photonic data services at IGrid 2002 , 2003, Future Gener. Comput. Syst..

[21]  Robert L. Grossman,et al.  Experiences in Design and Implementation of a High Performance Transport Protocol , 2004, Proceedings of the ACM/IEEE SC2004 Conference.