Exploiting Staleness for Approximating Loads on CMPs
暂无分享,去创建一个
Mahmut T. Kandemir | Chita R. Das | Anand Sivasubramaniam | Prasanna Venkatesh Rengasamy | M. Kandemir | A. Sivasubramaniam | C. Das
[1] Mario Badr,et al. Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] T. N. Vijaykumar,et al. Is SC + ILP = RC? , 1999, ISCA.
[3] Surendra Byna,et al. Exploiting the forgiving nature of applications for scalable parallel execution , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[4] Seunghak Lee,et al. Solving the Straggler Problem with Bounded Staleness , 2013, HotOS.
[5] Ion Stoica,et al. Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..
[6] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[7] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[8] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[9] Karthikeyan Sankaralingam,et al. Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.
[10] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[11] Milo M. K. Martin,et al. Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[12] Zeyuan Allen Zhu,et al. Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.
[13] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Per Stenström,et al. Reducing the Write Traffic for a Hybrid Cache Protocol , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[15] Anna R. Karlin,et al. Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).
[16] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.
[17] Michel Dubois,et al. Memory access buffering in multiprocessors , 1998, ISCA '98.
[18] David J. Lilja,et al. Using stochastic computing to implement digital image processing algorithms , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).
[19] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.
[20] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[21] Josep Torrellas,et al. Distance-adaptive update protocols for scalable shared-memory multiprocessors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[22] Krishna V. Palem,et al. Probabilistic CMOS Technology: A Survey and Future Directions , 2006, 2006 IFIP International Conference on Very Large Scale Integration.
[23] Kaushik Roy,et al. Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[24] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[25] Brian R. Gaines,et al. Stochastic computing , 1967, AFIPS '67 (Spring).
[26] Daniel M. Roy,et al. Probabilistically Accurate Program Transformations , 2011, SAS.
[27] Jaehyuk Huh,et al. Coherence decoupling: making use of incoherence , 2004, ASPLOS XI.
[28] Per Stenström,et al. An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic , 1994, PARLE.
[29] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[30] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[31] Donald Yeung,et al. Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[32] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[33] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[34] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[35] Onur Mutlu,et al. Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks , 2014, ACM Trans. Archit. Code Optim..
[36] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[37] Kaushik Roy,et al. Dynamic effort scaling: Managing the quality-efficiency tradeoff , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[38] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[39] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[40] Alberto Ros,et al. A Direct Coherence Protocol for Many-Core Chip Multiprocessors , 2010, IEEE Transactions on Parallel and Distributed Systems.