Mobile processors for energy-efficient web search

As cloud and utility computing spreads, computer architects must ensure continued capability growth for the data centers that comprise the cloud. Given megawatt scale power budgets, increasing data center capability requires increasing computing hardware energy efficiency. To increase the data center's capability for work, the work done per Joule must increase. We pursue this efficiency even as the nature of data center applications evolves. Unlike traditional enterprise workloads, which are typically memory or I/O bound, big data computation and analytics exhibit greater compute intensity. This article examines the efficiency of mobile processors as a means for data center capability. In particular, we compare and contrast the performance and efficiency of the Microsoft Bing search engine executing on the mobile-class Atom processor and the server-class Xeon processor. Bing implements statistical machine learning to dynamically rank pages, producing sophisticated search results but also increasing computational intensity. While mobile processors are energy-efficient, they exact a price for that efficiency. The Atom is 5× more energy-efficient than the Xeon when comparing queries per Joule. However, search queries on Atom encounter higher latencies, different page results, and diminished robustness for complex queries. Despite these challenges, quality-of-service is maintained for most, common queries. Moreover, as different computational phases of the search engine encounter different bottlenecks, we describe implications for future architectural enhancements, application tuning, and system architectures. After optimizing the Atom server platform, a large share of power and cost go toward processor capability. With optimized Atoms, more servers can fit in a given data center power budget. For a data center with 15MW critical load, Atom-based servers increase capability by 3.2× for Bing.

[1]  Sriram Sankar,et al.  Server Engineering Insights for Large-Scale Online Services , 2010, IEEE Micro.

[2]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2011, Commun. ACM.

[3]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Vanish Talwar,et al.  Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems , 2008, IEEE Micro.

[6]  J. P. Grossman,et al.  Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[7]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[8]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[9]  David Black-Schaffer,et al.  Efficient Embedded Computing , 2008, Computer.

[10]  Luiz André Barroso,et al.  The Price of Performance , 2005, ACM Queue.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.

[13]  Ozalp Babaoglu,et al.  ACM Transactions on Computer Systems , 2007 .

[14]  Leonid Oliker,et al.  Towards Ultra-High Resolution Models of Climate and Weather , 2008, Int. J. High Perform. Comput. Appl..

[15]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[16]  M. Annavaram,et al.  Energy per Instruction Trends in Intel ® Microprocessors , 2006 .

[17]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[18]  Sujata Banerjee,et al.  ElasticTree: Saving Energy in Data Center Networks , 2010, NSDI.

[19]  Ken Smits,et al.  Penryn: 45-nm next generation Intel® core™ 2 processor , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[20]  VaidKushagra,et al.  Mobile processors for energy-efficient web search , 2011 .

[21]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[22]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[23]  Norman P. Jouppi,et al.  Enterprise IT trends and implications for architecture research , 2005, 11th International Symposium on High-Performance Computer Architecture.

[24]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[25]  Belliappa Kuttanna,et al.  A Sub-1W to 2W Low-Power IA Processor for Mobile Internet Devices and Ultra-Mobile PCs in 45nm Hi-Κ Metal Gate CMOS , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[26]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[27]  David G. Andersen,et al.  Energy-efficient cluster computing with FAWN: workloads and implications , 2010, e-Energy.

[28]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[29]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[30]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[31]  John L. Klepeis,et al.  High-throughput pairwise point interactions in Anton, a specialized machine for molecular dynamics simulation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[32]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[33]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[34]  Edward T. Grochowski,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2008, IEEE Micro.

[35]  Christopher Chute,et al.  The Diverse and Exploding Digital Universe , 2011 .

[36]  David E. Irwin,et al.  Ensemble-level Power Management for Dense Blade Servers , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[37]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[38]  Rajesh Kumar,et al.  A family of 45nm IA processors , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[39]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.