Multi-dimensional characterization of temporal data mining on graphics processors

Through the algorithmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovation across a multitude of disciplines. For example, the exponential growth in data volume now presents an obstacle for high-throughput data mining in fields such as neuroscience and bioinformatics. As such, we present a characterization of a MapReduced-based data-mining application on a general-purpose GPU (GPGPU). Using neuroscience as the application vehicle, the results of our multi-dimensional performance evaluation show that a “one-size-fits-all” approach maps poorly across different GPGPU cards. Rather, a high-performance implementation on the GPGPU should factor in the 1) problem size, 2) type of GPU, 3) type of algorithm, and 4) data-access method when determining the type and level of parallelism. To guide the GPGPU programmer towards optimal performance within such a broad design space, we provide eight general performance characterizations of our data-mining application.

[1]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Debprakash Patnaik,et al.  Inferring Neuronal Network Connectivity using Time-constrained Episodes , 2007, ArXiv.

[4]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[5]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[6]  Maryvonne Miquel,et al.  Modeling of Ventricular Repolarisation Time Series by Multi-layer Perceptrons , 2001, AIME.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell Broadband Engine Architecture , 2009, IBM J. Res. Dev..

[9]  Philip Heng Wai Leong,et al.  Map-reduce as a Programming Model for Custom Computing Machines , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[10]  Kurt Keutzer,et al.  A map reduce framework for programming graphics processors , 2010 .

[11]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[13]  Pablo Valenti,et al.  Automatic detection of interictal spikes using data mining models , 2006, Journal of Neuroscience Methods.

[14]  Debprakash Patnaik,et al.  Discovering Patterns in Multi-neuronal Spike Trains using the Frequent Episode Method , 2007, ArXiv.

[15]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[16]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[17]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[18]  Pang-Ning Tan,et al.  Temporal Data Mining for the Discovery and Analysis of Ocean Climate Indices , 2002 .

[19]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.