PRINS: Processing-in-Storage Acceleration of Machine Learning
暂无分享,去创建一个
[1] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.
[2] S. Bhunia,et al. A Scalable Memory-Based Reconfigurable Computing Framework for Nanoscale Crossbar , 2012, IEEE Transactions on Nanotechnology.
[3] Hyunok Oh,et al. Collaborative processing of data-intensive algorithms with CPU, intelligent SSD, and GPU , 2016, SAC.
[4] Norman P. Jouppi,et al. FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[5] Uri C. Weiser,et al. MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.
[6] David J. DeWitt,et al. Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.
[7] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[8] Fabien Alibart,et al. Hybrid CMOS/nanodevice circuits for high throughput pattern matching applications , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).
[9] Yu Wang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[10] Dan Hammerstrom,et al. Methodology and Design of a Massively Parallel Memristive Stateful IMPLY Logic-Based Reconfigurable Architecture , 2016, IEEE Transactions on Nanotechnology.
[11] Thomas L. Sterling,et al. Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[12] Yue Zhao,et al. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.
[13] Francisco Herrera,et al. GPU-SME-kNN: Scalable and memory efficient kNN and lazy learning using GPUs , 2016, Inf. Sci..
[14] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[15] S. Wong,et al. Monolithic 3D Integrated Circuits , 2007, 2007 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA).
[16] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .
[17] G. Ghibaudo,et al. Understanding RRAM endurance, retention and window margin trade-off using experimental results and simulations , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).
[18] Jun Peng,et al. An Efficient KNN Algorithm Implemented on FPGA Based Heterogeneous Computing System Using OpenCL , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.
[19] Nishil Talati,et al. Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.
[20] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.
[21] Eby G. Friedman,et al. AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.
[22] Hisashi Shima,et al. Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.
[23] Jason Weston,et al. #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.
[24] Subhasish Mitra,et al. Three-dimensional integration of nanotechnologies for computing and data storage on a single chip , 2017, Nature.
[25] W. C. Meilander,et al. Array processor supercomputers , 1989, Proc. IEEE.
[26] Engin Ipek,et al. A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] R. Williams,et al. Sub-nanosecond switching of a tantalum oxide memristor , 2011, Nanotechnology.
[28] Rajesh Gupta,et al. Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.
[29] Ran Ginosar,et al. Resistive Associative Processor , 2015, IEEE Computer Architecture Letters.
[30] Lingli Wang,et al. High-performance K-means Implementation based on a Simplified Map-Reduce Architecture , 2016, 1610.05601.
[31] Eby G. Friedman,et al. Resistive Ternary Content Addressable Memory Systems for Data-Intensive Computing , 2015, IEEE Micro.
[32] Ran Ginosar,et al. A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment , 2017, IEEE Micro.
[33] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[34] Miriam Leeser,et al. Accelerating K-Means clustering with parallel implementations and GPU computing , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[35] Chanik Park,et al. Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).
[36] Masahide Matsumoto,et al. A 130.7-$\hbox{mm}^{2}$ 2-Layer 32-Gb ReRAM Memory Device in 24-nm Technology , 2014, IEEE Journal of Solid-State Circuits.
[37] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[38] Karin Strauss,et al. Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.
[39] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[40] Jaewook Shin,et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[41] Shimeng Yu,et al. Metal–Oxide RRAM , 2012, Proceedings of the IEEE.
[42] Uri C. Weiser,et al. TEAM: ThrEshold Adaptive Memristor Model , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.
[43] Doohwan Oh,et al. XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD , 2013 .
[44] Peter Desnoyers,et al. Active Flash: Out-of-core data analytics on flash storage , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[45] Ran Ginosar,et al. Deduplication in resistive content addressable memory based solid state drive , 2016, 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).
[46] X. Miao,et al. Realization of Functional Complete Stateful Boolean Logic in Memristive Crossbar. , 2016, ACS applied materials & interfaces.
[47] Armin Alaghi,et al. Similarity Search on Automata Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[48] J Joshua Yang,et al. Memristive devices for computing. , 2013, Nature nanotechnology.
[49] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.
[50] George A. Constantinides,et al. A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.