Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation

With the amount of data continuing to grow, extracting "data of interest" is becoming popular, pervasive, and more important than ever. Data mining, as this process is known as, seeks to draw meaningful conclusions, extract knowledge, and acquire models from vast amounts of data. These compute-intensive data-mining applications, where thread-level parallelism can be effectively exploited, are the design targets of future multi-core systems. As a result, future multi-core systems will be required to process terabyte-level workloads. To understand the memory system performance of data-mining applications, this paper presents the use of hardware-software co-simulation to explore the cache design space of several multi-threaded data mining applications. Our study reveals that the workloads are memory intensive, have large working-set sizes, and exhibit good data locality. We find that large DRAM caches can be useful to address their large working-set sizes

[1]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[2]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[3]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4]  Gheith A. Abandah,et al.  Configuration independent analysis for characterizing shared-memory applications , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[5]  Richard Uhlig,et al.  SoftSDV: A Presilicon Software Development Environment for the IA-64 Architecture , 1999 .

[6]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[7]  Ramendra K. Sahoo,et al.  MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design , 2000, SIGP.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Shih-Lien Lu,et al.  Real-time L3 cache simulations using the Programmable Hardware-Assisted Cache Emulator (PHA$E) , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[10]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[11]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[12]  Tao Wang,et al.  Parallel Linear Space Algorithm for Large-Scale Sequence Alignment , 2005, Euro-Par.

[13]  T. N. Vijaykumar,et al.  Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.

[14]  Srihari Makineni,et al.  Exploring the cache design space for large scale CMPs , 2005, CARN.

[15]  Carole Dulong,et al.  Performance Scalability of Data-Mining Workloads in Bioinformatics , 2005 .

[16]  Eriko Nurvitadhi,et al.  Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C , 2005, ICS '05.

[17]  Tao Wang,et al.  Towards the Parallelization of Shot Detection - a Typical Video Mining Application Study , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[18]  Aamer Jaleel,et al.  Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19]  Tao Wang,et al.  Workload Characterization of a Parallel Video Mining Application on a 16-Way Shared-Memory Multiprocessor System , 2006, 2006 IEEE International Symposium on Workload Characterization.

[20]  Gokhan Memik,et al.  Performance Characterization of Data Mining Applications using MineBench , 2006 .