Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost

The combined impact of node architecture and workload characteristics on off-chip network traffic with performance/cost analysis has not been investigated before in the context of emerging cloud applications. Motivated by this observation, this paper performs a thorough characterization of twelve cloud workloads using a full-system datacenter simulation infrastructure. We first study the inherent network characteristics of emerging cloud applications including message inter-arrival times, packet sizes, inter-node communication overhead, self-similarity, and traffic volume. Then, we study the effect of hardware architectural metrics on network traffic. Our experimental analysis reveals that (1) the message arrival times and packet-size distributions exhibit variances across different cloud applications, (2) the inter-arrival times imply a large amount of self-similarity as the number of nodes increase, (3) the node architecture can play a significant role in shaping the overall network traffic, and finally, (4) the applications we study can be broadly divided into those which perform better in a scale-out or scale-up configuration at node level and into two categories, namely, those that have long-duration, low-burst flows and those that have short-duration, high-burst flows. Using the results of (3) and (4), the paper discusses the performance/cost trade-offs for scale-out and scale-up approaches and proposes an analytical model that can be used to predict the communication and computation demand for different configurations. It is shown that the difference between two different node architecture's performance per dollar cost (under same number of cores system wide) can be as high as 154 percent which disclose the need for accurate characterization of cloud applications before wasting the precious cloud resources by allocating wrong architecture. The results of this study can be used for system modeling, capacity planning and managing heterogeneous resources for large-scale system designs.

[1]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[2]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[3]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[4]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[5]  Ahmad Khonsari,et al.  Mathematical analysis of buffer sizing for Network-on-Chips under multimedia traffic , 2008, 2008 IEEE International Conference on Computer Design.

[6]  L. Oxley,et al.  Estimators for Long Range Dependence: An Empirical Study , 2009, 0901.0762.

[7]  Zafer Sahinoglu,et al.  On multimedia networks: self-similar traffic and network performance , 1999, IEEE Commun. Mag..

[8]  Charalampos E. Tsourakakis,et al.  HADI : Fast Diameter Estimation and Mining in Massive Graphs with Hadoop , 2008 .

[9]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[10]  Yiannakis Sazeides,et al.  An analytical framework for estimating TCO and exploring data center design space , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Mark E. Crovella,et al.  Effect of traffic self-similarity on network performance , 1997, Other Conferences.

[12]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[13]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[14]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[15]  Chita R. Das,et al.  Towards a communication characterization methodology for parallel applications , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[16]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[17]  Mendel Rosenblum,et al.  It's Time for Low Latency , 2011, HotOS.

[18]  Amin Jadidi,et al.  A morphable phase change memory architecture considering frequent zero values , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[19]  Babak Falsafi,et al.  Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[20]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[21]  M. Kandemir,et al.  Modeling and Optimization of Straggling Mappers , 2014 .

[22]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[23]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[24]  Amin Jadidi,et al.  High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[25]  Chita R. Das,et al.  Characterizing Network Traffic in a Cluster-based, Multi-tier Data Center , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Amin Vahdat,et al.  Scale-Out Networking in the Data Center , 2010, IEEE Micro.

[28]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[29]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[30]  Farshid Farhat Stochastic Modeling and Optimization of Stragglers in Mapreduce Framework , 2015 .

[31]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[32]  Amin Jadidi,et al.  MLC PCM main memory with accelerated read , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[33]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.