Workload and network-optimized computing systems

This paper describes a recent system-level trend toward the use of massive on-chip parallelism combined with efficient hardware accelerators and integrated networking to enable new classes of applications and computing-systems functionality. This system transition is driven by semiconductor physics and emerging network-application requirements. In contrast to general-purpose approaches, workload and network-optimized computing provides significant cost, performance, and power advantages relative to historical frequency-scaling approaches in a serial computational model. We highlight the advantages of on-chip network optimization that enables efficient computation and new services at the network edge of the data center. Software and application development challenges are presented, and a service-oriented architecture application example is shown that characterizes the power and performance advantages for these systems. We also discuss a roadmap for next-generation systems that proportionally scale with future networking bandwidth growth rates and employ 3-D chip integration methods for design flexibility and modularity.

[1]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[2]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[3]  Katsuyuki Sakuma,et al.  3D chip stacking with C4 technology , 2008, IBM J. Res. Dev..

[4]  Qing Wang,et al.  Wireless network cloud: Architecture and system requirements , 2010, IBM J. Res. Dev..

[5]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[6]  N. Gura,et al.  UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[7]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[8]  H. Franke,et al.  Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[9]  Norman C. Strole,et al.  BladeCenter networking , 2005, IBM J. Res. Dev..

[10]  Yuan Xie,et al.  Processor Design in 3D Die-Stacking Technologies , 2007, IEEE Micro.

[11]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[14]  Hao Yu,et al.  Exploiting heterogeneous multicore-processor systems for high-performance network processing , 2010, IBM J. Res. Dev..

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[17]  Muli Ben-Yehuda,et al.  Applying Amdahl's Other Law to the data center , 2009, IBM J. Res. Dev..