Shared State for Distributed Interactive Data Mining Applications

Distributed data mining applications involving user interaction are now feasible due to advances in processor speed and network bandwidth. These applications are traditionally implemented using ad-hoc communication protocols, which are often either cumbersome or inefficient. This paper presents and evaluates a system for sharing state among such interactive distributed data mining applications, developed with the goal of providing both ease of programming and efficiency. Our system, called InterAct, supports data sharing efficiently by allowing caching, by communicating only the modified data, and by allowing relaxed coherence requirement specification for reduced communication overhead, as well as placement of data for improved locality, on a per client and per data structure basis. Additionally, our system supports the ability to supply clients with consistent copies of shared data even while the data is being modified.We evaluate the performance of the system on a set of data mining applications that perform queries on data structures that summarize information from the databases of interest. We demonstrate that providing a runtime system such as InterAct results in a 10–30 fold improvement in execution time due to shared data caching, the applications' ability to tolerate stale data (client-controlled coherence), and the ability to off-load some of the computation from the server to the client. Performance is improved without requiring complex communication protocols to be built into the application, since the runtime system uses knowledge about application behavior (encoded by specifying coherence requirements) in order to automatically optimize the resources utilized for communication. We also demonstrate that for our benchmark tests, the quality of the results generated is not significantly deteriorated due to the use of more relaxed coherence protocols.

[1]  John Riedl,et al.  Toward computer-supported concurrent software engineering , 1993, Computer.

[2]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[3]  Srinivasan Parthasarathy,et al.  Shared State for Client-Server Mining , 2001, SDM.

[4]  Srinivasan Parthasarathy,et al.  InterAct: Virtual Sharing for Interactive Client-Server Applications , 1998, LCR.

[5]  Robert D. Logcher,et al.  DICE: An object-oriented programming environment for cooperative engineering design , 1992 .

[6]  Mahadev Satyanarayanan,et al.  Multi-fidelity algorithms for interactive mobile applications , 1999, DIALM '99.

[7]  Nicholas Carriero,et al.  Matching Language and Hardware for Parallel Computation in the Linda Machine , 1988, IEEE Trans. Computers.

[8]  Srinivasan Parthasarathy,et al.  InterWeave: A Middleware System for Distributed Shared State , 2000, LCR.

[9]  Mitsunori Ogihara,et al.  Clustering Homogeneous Distributed Datasets , 2000 .

[10]  Clifford W. Mercer Operating system support for multimedia applications , 1994, MULTIMEDIA '94.

[11]  Klara Nahrstedt,et al.  QoS-aware resource management for distributed multimedia applications^{1} , 1998, J. High Speed Networks.

[12]  L. Devroye A Course in Density Estimation , 1987 .

[13]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[14]  Liviu Iftode,et al.  Improving release-consistent shared virtual memory using automatic update , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[15]  Sandhya Dwarkadas,et al.  Beyond S-DSM: Shared State for Distributed Systems , 2001 .

[16]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[17]  Srinivasan Parthasarathy,et al.  Active Mining in a Distributed Setting , 1999, Large-Scale Parallel Data Mining.

[18]  Byung-Hoon Park,et al.  Collective Data Mining: A New Perspective Toward Distributed Data Analysis , 1999 .

[19]  Ouri Wolfson,et al.  Divergence caching in client-server architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[20]  M. Frans Kaashoek,et al.  Rover: a toolkit for mobile information access , 1995, SOSP.

[21]  Srinivasan Parthasarathy,et al.  Memory Placement Techniques for Parallel Association Mining , 1998, KDD.

[22]  Dirk Grunwald,et al.  Improving the cache locality of memory allocation , 1993, PLDI '93.

[23]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[24]  Eduardo Pinheiro,et al.  S-DSM for Heterogeneous Machine Architectures , 2000 .

[25]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[26]  Philip S. Yu,et al.  Online generation of association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[27]  Srinivasan Parthasarathy,et al.  Towards network-aware data mining , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[28]  M. van Steen,et al.  The Architectural Design of Globe: A Wide-Area Distributed System , 1997 .

[29]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[30]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[31]  Michael J. Franklin,et al.  Client Data Caching: A Foundation for High Performance Object Database Systems , 1996 .

[32]  Srinivasan Parthasarathy,et al.  Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[33]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[34]  Mitsunori Ogihara,et al.  Active data mining in a distributed setting , 2000 .

[35]  Galen C. Hunt,et al.  Vm-based Shared Memory On Low-latency, Remote-memory-access Networks , 1996, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[36]  Marc Shapiro,et al.  PerDiS ―- a Persistent Distributed Store for Cooperative Applications , 1997 .

[37]  Krithi Ramamritham,et al.  Maintaining temporal coherency of virtual data warehouses , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[38]  Adam Dingle,et al.  Web Cache Coherence , 1996, Comput. Networks.

[39]  Mahadev Satyanarayanan,et al.  Multi-Fidelity Algorithms for Interactive Mobile Applications , 2001 .

[40]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[41]  Jessica K. Hodgins,et al.  Temporal notions of synchronization and consistency in Beehive , 1997, SPAA '97.

[42]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[43]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[44]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[45]  Geraldine Fitzpatrick,et al.  Work, Locales and Distributed Social Worlds , 1995, ECSCW.

[46]  Miguel Castro,et al.  Safe and efficient sharing of persistent objects in Thor , 1996, SIGMOD '96.

[47]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[48]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[49]  P.R. Wilson,et al.  Pointer swizzling at page fault time: efficiently and compatibly supporting huge address spaces on standard hardware , 1992, [1992] Proceedings of the Second International Workshop on Object Orientation in Operating Systems.

[50]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[51]  Brian N. Bershad,et al.  Software write detection for a distributed shared memory , 1994, OSDI '94.

[52]  Ramesh Subramonian,et al.  A framework for distributed data mining , 1998 .

[53]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[54]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[55]  Srinivasan Parthasarathy,et al.  Clustering Distributed Homogeneous Datasets , 2000, PKDD.

[56]  Stefan Savage,et al.  Processor capacity reserves: operating system support for multimedia applications , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.