Hadoop deployment and performance on Gordon data intensive supercomputer
暂无分享,去创建一个
The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.
[1] Eva Hocks,et al. Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer , 2012, XSEDE '12.