Hardware-Software Co-Design: Not Just a Cliché

The age of the air-tight hardware abstraction is over. As the computing ecosystem moves beyond the predictable yearly advances of Moore's Law, appeals to familiarity and backwards compatibility will become less convincing: fundamental shifts in abstraction and design will look more enticing. It is time to embrace hardware-software co-design in earnest, to cooperate between programming languages and architecture to upend legacy constraints on computing. We describe our work on approximate computing, a new avenue spanning the system stack from applications and languages to microarchitectures. We reflect on the challenges and successes of approximation research and, with these lessons in mind, distill opportunities for future hardware-software co-design efforts.

[1]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[3]  James R. Larus,et al.  Singularity: rethinking the software stack , 2007, OPSR.

[4]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[5]  Luis Ceze,et al.  Automatic discovery of performance and energy pitfalls in HTML and CSS , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[6]  Kathryn S. McKinley,et al.  Uncertain: a first-order type for uncertain data , 2014, ASPLOS.

[7]  Dan Grossman,et al.  Expressing and verifying probabilistic assertions , 2014, PLDI.

[8]  Nikolai Tillmann,et al.  TouchDevelop: programming cloud-connected mobile devices via touchscreen , 2011, Onward! 2011.

[9]  James R. Larus,et al.  Deconstructing process isolation , 2006, MSPC '06.

[10]  Arvind,et al.  What is Bluespec? , 2009, SIGD.

[11]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[12]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[13]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[14]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[15]  Henry Hoffmann,et al.  Patterns and statistical analysis for understanding reduced resource computing , 2010, OOPSLA.

[16]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[17]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[18]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[19]  P. Madhusudan,et al.  Thread contracts for safe parallelism , 2011, PPoPP '11.

[20]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[21]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[22]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[23]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[25]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[26]  K. Sankaralingam,et al.  Exploring the Synergy of Emerging Workloads and Silicon Reliability Trends , 2009 .

[27]  Jacob Nelson,et al.  SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[28]  Kunle Olukotun,et al.  Delite , 2014, ACM Trans. Embed. Comput. Syst..

[29]  James R. Larus,et al.  Sealing OS processes to improve dependability and safety , 2007, EuroSys '07.

[30]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[31]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, MICRO-46.

[32]  Karthik Pattabiraman,et al.  Flicker: Saving Refresh-Power in Mobile Devices through Critical Data Partitioning , 2009 .

[33]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[34]  Sarita V. Adve,et al.  DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[35]  Rastislav Bodík,et al.  Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana , 2015 .

[36]  Dan Grossman,et al.  Monitoring and Debugging the Quality of Results in Approximate Programs , 2015, ASPLOS.

[37]  Glenn Reinman,et al.  Improving Coverage and Reliability in Approximate Computing Using Application-Specific , Light-Weight Checks , 2014 .

[38]  David E. Shaw,et al.  Anton: A specialized ASIC for molecular dynamics , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[39]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[40]  Krishna V. Palem,et al.  Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[41]  Sridhar Narayanan,et al.  TSOtool: a program for verifying memory systems using the memory consistency model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[42]  Emina Torlak,et al.  MemSAT: checking axiomatic specifications of memory models , 2010, PLDI '10.

[43]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[44]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[45]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[46]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[47]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[48]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[49]  David Walker,et al.  Languages for software-defined networks , 2013, IEEE Communications Magazine.