Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction

In this paper, we assess the past, present, and future of mobile CPU design. We study how mobile CPU designs trends have impacted the end-user, hardware design, and the holistic mobile device. We analyze the evolution often cutting-edge mobile CPU designs released over the past seven years. Specifically, we report measured performance, power, energy and user satisfaction trends across mobile CPU generations. A key contribution of our work is that we contextualize the mobile CPU's evolution in terms of user satisfaction, which has largely been absent from prior mobile hardware studies. To bridge the gap between mobile CPU design and user satisfaction, we construct and conduct a novel crowdsourcing study that spans over 25,000 survey participants using the Amazon Mechanical Turk service. Our methodology allows us to identify what mobile CPU design techniques provide the most benefit to the end-user's quality of user experience. Our results quantitatively demonstrate that CPUs play a crucial role in modern mobile system-on-chips (SoCs). Over the last seven years, both single-and multicore performance improvements have contributed to end-user satisfaction by reducing user-critical application response latencies. Mobile CPUs aggressively adopted many power-hungry desktop-oriented design techniques to reach these performance levels. Unlike other smartphone components (e.g. display and radio) whose peak power consumption has decreased over time, the mobile CPU's peak power consumption has steadily increased. As the limits of technology scaling restrict the ability of desktop-like scaling to continue for mobile CPUs, specialized accelerators appear to be a promising alternative that can help sustain the power, performance, and energy improvements that mobile computing necessitates. Such a paradigm shift will redefine the role of the CPU within future SoCs, which merit several design considerations based on our findings.

[1]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  Vijay Janapa Reddi,et al.  Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[3]  Trevor Mudge,et al.  Thread-level parallelism and interactive performance of desktop applications , 2000, ASPLOS IX.

[4]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[6]  Gu-Yeon Wei,et al.  The Aladdin Approach to Accelerator Design and Modeling , 2015, IEEE Micro.

[7]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[8]  Carole-Jean Wu,et al.  Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Peter A. Dinda,et al.  Power to the people: Leveraging human physiological traits to control microprocessor frequency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[10]  Carole-Jean Wu,et al.  A study of mobile device utilization , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Wei Chen,et al.  GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures , 2012, 2012 41st International Conference on Parallel Processing.

[12]  Alain J. Martin,et al.  ET 2 : a metric for time and energy efficiency of computation , 2002 .

[13]  MudgeTrevor,et al.  Thread-level parallelism and interactive performance of desktop applications , 2000 .

[14]  Yale N. Patt Requirements, bottlenecks, and good fortune: agents for microprocessor evolution , 2001 .

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Mahmut T. Kandemir,et al.  Short-Circuiting Memory Traffic in Handheld Platforms , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  Brian Rogers,et al.  Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.

[18]  Gokhan Memik,et al.  Into the wild: Studying real user activity patterns to guide power optimizations for mobile architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Katharina Reinecke,et al.  Crowdsourcing performance evaluations of user interfaces , 2013, CHI.

[20]  Krisztián Flautner,et al.  A study of Thread Level Parallelism on mobile devices , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[21]  Fred Schlachter No Moore’s Law for batteries , 2013, Proceedings of the National Academy of Sciences.

[22]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[23]  Zhen Wang,et al.  How far can client-only solutions go for mobile browser speed? , 2011, WWW.

[24]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.

[25]  Todd D. Millstein,et al.  RERAN: Timing- and touch-sensitive record and replay for Android , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Xipeng Shen,et al.  SatScore: uncovering and avoiding a principled pitfall in responsiveness measurements of app launches , 2014, UbiComp.

[27]  Krisztián Flautner,et al.  Evolution of thread-level parallelism in desktop applications , 2010, ISCA.

[28]  Carson Jonathan Reynolds,et al.  The sensing and measurement of frustration with computers , 2001 .

[29]  SolihinYan,et al.  Scaling the bandwidth wall , 2009 .

[30]  Lixin Zhang,et al.  Moby: A mobile benchmark suite for architectural simulators , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[31]  Ronald G. Dreslinski,et al.  Full-system analysis and characterization of interactive smartphone applications , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[32]  Xi Yang,et al.  Looking back on the language and hardware revolutions: measured power, performance, and scaling , 2011, ASPLOS XVI.

[33]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[34]  Scott B. Baden,et al.  Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.

[35]  S McKinleyKathryn,et al.  Looking back on the language and hardware revolutions , 2011 .

[36]  Peter A. Dinda,et al.  Learning and Leveraging the Relationship between Architecture-Level Measurements and Individual User Satisfaction , 2008, 2008 International Symposium on Computer Architecture.

[37]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[38]  Vijay Janapa Reddi,et al.  WebCore: Architectural support for mobile Web browsing , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).