High Performance Computing and I/O Architectures for Database and Knowledge Discovery: The System Design Perspective

Research in parallel database (DB) and data mining (DM) algorithms has experienced a significant growth due to advancements in high performance computing (HPC) systems. Enabling technologies such as multi-core processors, object-based storage and high-bandwidth interconnects helped propel innovations to address fast increasing demands in scientific and commercial computing. Large-scale applications involving data in the order of tera-bytes or beyond, not uncommon, require characterization of the HPC system designs to address potential performance bottlenecks. This paper will attempt to design and characterize HPC architectures with novel micro-electromechanical system (MEMS) based storage, where parallel DB and DM algorithms are utilized for inferences and knowledge discovery. A visualization system exploiting parallel message passing interface and open source libraries will be developed. Findings and multiple components from the proposed research may also be extensible to other scientific areas.

[1]  Mahmut T. Kandemir,et al.  Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation , 2004, J. Parallel Distributed Comput..

[2]  Sanjay Mishra,et al.  Oracle Parallel Processing , 2000 .

[3]  Mario Cannataro,et al.  Parallel data intensive computing in scientific and commercial applications , 2002, Parallel Comput..

[4]  J. Griffin,et al.  Designing computer systems with MEMS-based storage , 2000, SIGP.

[6]  Alok N. Choudhary,et al.  Impact of Interconnect Protocols and Device-Level Performance on Distributed Active Storage Architectures , 2005, PDPTA.

[7]  Gregory R. Ganger,et al.  Modeling and performance of MEMS-based storage devices , 2000, SIGMETRICS '00.

[8]  Mahmut T. Kandemir,et al.  Data management for large-scale scientific computations in high performance distributed systems , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[9]  L. Richard Carley,et al.  MEMS-based integrated-circuit mass-storage systems , 2000, CACM.

[10]  Wei-keng Liao,et al.  Processor-embedded distributed MEMS-based storage systems for high-performance I/O , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Michael L. Norman,et al.  Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formation , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[12]  Andrea C. Arpaci-Dusseau,et al.  Semantically-Smart Disk Systems , 2003, FAST.

[13]  Alan Jay Smith,et al.  Projecting the performance of decision support workloads on systems with smart storage (SmartSTOR) , 1999, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[14]  Mahmut T. Kandemir,et al.  Data management for large‐scale scientific computations in high performance distributed systems , 2004, Cluster Computing.