Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage
暂无分享,去创建一个
Dhabaleswar K. Panda | Md. Wasi-ur-Rahman | Nusrat S. Islam | Xiaoyi Lu | D. Panda | Xiaoyi Lu | Md. Wasi-ur-Rahman
[1] David E. Culler,et al. SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.
[2] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[3] Alan L. Cox,et al. The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[4] Konstantin V. Shvachko,et al. HDFS Scalability: The Limits to Growth , 2010, login Usenix Mag..
[5] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[6] Rashid Tahir,et al. A Dynamic Caching Mechanism for Hadoop using Memcached , 2012 .
[7] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[8] Srikanth Kandula,et al. PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.
[9] Dhabaleswar K. Panda,et al. High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[11] Dhabaleswar K. Panda,et al. High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.
[12] Dhabaleswar K. Panda,et al. SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS , 2014, HPDC '14.
[13] Dhabaleswar K. Panda,et al. HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects , 2014, ICS '14.
[14] Dhabaleswar K. Panda,et al. In-memory I/O and replication for HDFS with Memcached: Early experiences , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[15] Scott Shenker,et al. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.
[16] Dhabaleswar K. Panda,et al. Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store , 2015, 2015 44th International Conference on Parallel Processing.
[17] Dhabaleswar K. Panda,et al. High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[18] Dhabaleswar K. Panda,et al. Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[19] Dhabaleswar K. Panda,et al. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[20] Dhabaleswar K. Panda,et al. A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.