论文信息 - CodHoop: A system for optimizing big data processing

CodHoop: A system for optimizing big data processing

The rise of the cloud and distributed data-intensive (“Big Data”) applications puts pressure on data center networks due to the movement of massive volumes of data. This paper proposes CodHoop a system employing network coding techniques, specifically index coding, as a means of dynamically-controlled reduction in volume of communication. Using Hadoop as a representative of this class of applications, a motivating use-case is presented. The proof-of-concept implementation results exhibit an average advantage of 31% compared to vanilla Hadoop implementation which depending on use-case translates to 31% less energy utilization of the equipment, 31% more jobs that run simultaneously, or to a 31% decrease in job completion time.

David Malone | Zakia Asad | Mohammad Asad R. Chaudhry

[1] Meng Wang,et al. A Practical Performance Model for Hadoop MapReduce , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[2] Rudolf Ahlswede,et al. Network information flow , 2000, IEEE Trans. Inf. Theory.

[3] Vyas Sekar,et al. SmartRE: an architecture for coordinated network-wide redundancy elimination , 2009, SIGCOMM '09.

[4] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[5] M.A.R. Chaudhry,et al. Efficient algorithms for Index Coding , 2008, IEEE INFOCOM Workshops 2008.

[6] Praveen Yalagandula,et al. Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[7] Michael Langberg,et al. Finding Sparse Solutions for the Index Coding Problem , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[8] Michael Langberg,et al. On the complementary Index Coding problem , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[9] Michael I. Jordan,et al. Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[10] Antony I. T. Rowstron,et al. Camdoop: Exploiting In-network Aggregation for Big Data Applications , 2012, NSDI.