In the late 1990s, our research group at DEC was one of a growing number of teams advocating the CMP (chip multiprocessor) as an alternative to highly complex single-threaded CPUs. We were designing the Piranha system,1 which was a radical point in the CMP design space in that we used very simple cores (similar to the early RISC designs of the late ’80s) to provide a higher level of thread-level parallelism. Our main goal was to achieve the best commercial workload performance for a given silicon budget. Today, in developing Google’s computing infrastructure, our focus is broader than performance alone. The merits of a particular architecture are measured by answering the following question: Are you able to afford the computational capacity you need? The high-computational demands that are inherent in most of Google’s services have led us to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.
[1]
Sarita V. Adve,et al.
Performance of database workloads on shared-memory systems with out-of-order processors
,
1998,
ASPLOS VIII.
[2]
Luiz André Barroso,et al.
Piranha: a scalable architecture based on single-chip multiprocessing
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[3]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[4]
Kunle Olukotun,et al.
Niagara: a 32-way multithreaded Sparc processor
,
2005,
IEEE Micro.
[5]
Acknowledgments
,
2006,
Molecular and Cellular Endocrinology.