Fast Data Delivery for Many-Core Processors
暂无分享,去创建一个
Mehdi Modarressi | Mahmood Naderan-Tahan | Pejman Lotfi-Kamran | Abbas Mazloumi | Mohammad Bakhshalipour | Hamid Sarbazi-Azad | Farid Samandi | H. Sarbazi-Azad | P. Lotfi-Kamran | M. Modarressi | Mohammad Bakhshalipour | Abbas Mazloumi | Mahmood Naderan-Tahan | Farid Samandi
[1] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[2] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[3] Pejman Lotfi-Kamran,et al. An Efficient Temporal Data Prefetcher for L1 Caches , 2017, IEEE Computer Architecture Letters.
[4] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[5] Nan Jiang,et al. A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[6] Onur Mutlu,et al. Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[7] William J. Dally,et al. Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[8] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Babak Falsafi,et al. SHIFT: Shared history instruction fetch for lean-core server processors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Thomas F. Wenisch,et al. Temporal streaming of shared memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[11] Hamid Sarbazi-Azad,et al. An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors , 2016, IEEE Transactions on Computers.
[12] Babak Falsafi,et al. NOC-Out: Microarchitecting a Scale-Out Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[13] Rami G. Melhem,et al. Proactive circuit allocation in multiplane NoCs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[14] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[15] Onur Mutlu,et al. A case for bufferless routing in on-chip networks , 2009, ISCA '09.
[16] George Michelogiannakis,et al. An analysis of on-chip interconnection networks for large-scale chip multiprocessors , 2010, TACO.
[17] Mehmet Kayaalp,et al. RIC: Relaxed Inclusion Caches for mitigating LLC side-channel attacks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[18] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[19] Christopher Hughes,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.
[20] Hamid Sarbazi-Azad,et al. Domino Temporal Data Prefetcher , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[22] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..
[23] Hamid Sarbazi-Azad,et al. Near-Ideal Networks-on-Chip for Servers , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[24] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[25] Pierre Michaud. Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[27] Niraj K. Jha,et al. Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.
[28] Natalie D. Enright Jerger,et al. The runahead network-on-chip , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[29] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[30] Pejman Lotfi-Kamran,et al. Cache Replacement Policy Based on Expected Hit Count , 2018, IEEE Computer Architecture Letters.
[31] Avinash Sodani,et al. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[32] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[33] Mehdi Modarressi,et al. NOC characteristics of cloud applications , 2017, 2017 19th International Symposium on Computer Architecture and Digital Systems (CADS).
[34] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[35] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[36] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[37] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[38] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[39] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).