Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model

New architectural trends in chip design resulted in machines with multiple processing units as well as efficient communication networks, leading to the wide availability of systems that provide multiple levels of parallelism, both inter- and intra-machine. Developing applications that efficiently make use of such systems is a challenge, specially for application-domain programmers. In this paper we present a new version of the Anthill programming environment that efficiently exploits multi-level parallelism and experimental results that demonstrate such efficiency. Anthill is based on the filter-stream model; in this model, applications are decomposed into a set of filters communicating through streams, which has already been shown to be efficient for expressing inter-machine parallelism. We replaced the filter run-time environment, originally process-oriented, with an event-oriented version. This new version allow programmers to efficiently express opportunities for parallelism within each compute node through a higher-level programming abstraction. We evaluated our solution on dual- and quad-core machines with two data mining applications: Eclat and KNN. Both had drops in execution time nearly proportional to the number of cores on a single machine. When using a cluster of dual-core machines, speed-ups were close to linear on the number of available cores for both applications, confirming event-oriented Anthill performs well both on the inter- and intra-machine parallelism levels.

[1]  Tony Pan,et al.  Run-time Support for Efficient Execution of Scientific Workflows on Distributed Environments , .

[2]  Tony Pan,et al.  XML database support for distributed execution of data-intensive scientific workflows , 2005, SGMD.

[3]  Matti A. Hiltunen,et al.  Coyote: a system for constructing fine-grain configurable communication services , 1998, TOCS.

[4]  Wagner Meira,et al.  ParTriCluster: A Scalable Parallel Algorithm for Gene Expression Analysis , 2006, 2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06).

[5]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  R. Jin,et al.  Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms , .

[8]  Alexandros Stamatakis,et al.  Dynamic multigrain parallelization on the cell broadband engine , 2007, PPoPP.

[9]  Srinivasan Parthasarathy,et al.  Asynchronous and Anticipatory Filter-Stream Based Parallel Algorithm for Frequent Itemset Mining , 2004, PKDD.

[10]  Dhabaleswar K. Panda,et al.  Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[11]  Lúcia Maria de A. Drummond,et al.  Anthill: a scalable run-time environment for data mining applications , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[12]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[13]  Larry L. Peterson,et al.  A dynamic network architecture , 1992, TOCS.