Graph processing and machine learning architectures with emerging memory technologies: a survey
暂无分享,去创建一个
[1] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Rajeev Balasubramonian,et al. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[4] Meng-Fan Chang,et al. A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).
[5] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Andrew G. Howard,et al. Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.
[7] Farnood Merrikh-Bayat,et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.
[8] Iryna Gurevych,et al. Analysis of the Wikipedia Category Graph for NLP Applications , 2007 .
[9] Karin Strauss,et al. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .
[10] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Long Jin,et al. Understanding Graph Sampling Algorithms for Social Network Analysis , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.
[13] Pengyu Liu,et al. Large-Area WS2 Film with Big Single Domains Grown by Chemical Vapor Deposition , 2017, Nanoscale Research Letters.
[14] Chung-Wei Hsu,et al. Self-rectifying bipolar TaOx/TiO2 RRAM with superior endurance over 1012 cycles for 3D high-density storage-class memory , 2013, 2013 Symposium on VLSI Technology.
[15] Xin Jin,et al. ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.
[16] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Xiao Liu,et al. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.
[18] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[19] Marco S. Nobile,et al. Graphics processing units in bioinformatics, computational biology and systems biology , 2016, Briefings Bioinform..
[20] Shuchuan Lo,et al. WMR--A Graph-Based Algorithm for Friend Recommendation , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).
[21] Ozcan Ozturk,et al. Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[22] Wei Li,et al. Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.
[23] Hisashi Shima,et al. Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.
[24] Wang Guo-yu. Study of network security evaluation based on attack graph model , 2007 .
[25] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[26] Karin Strauss,et al. Toward accelerating deep learning at scale using specialized hardware in the datacenter , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[27] Shimeng Yu,et al. Metal–Oxide RRAM , 2012, Proceedings of the IEEE.
[28] Bo Wu,et al. AutoMine: harmonizing high-level abstraction and high performance for graph mining , 2019, SOSP.
[29] William J. Dally,et al. Cost-Efficient Dragonfly Topology for Large-Scale Systems , 2009, IEEE Micro.
[30] Sarala M. Wimalaratne,et al. The Systems Biology Graphical Notation , 2009, Nature Biotechnology.
[31] Huan Liu,et al. Graph Mining Applications to Social Network Analysis , 2010, Managing and Mining Graph Data.
[32] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[33] Charbel Farhat,et al. Accelerated mesh sampling for the hyper reduction of nonlinear computational models , 2017 .
[34] David A. Patterson,et al. A new golden age for computer architecture , 2019, Commun. ACM.
[35] Yu Huang,et al. Spara: An Energy-Efficient ReRAM-Based Accelerator for Sparse Graph Analytics Applications , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[36] Chun Chen,et al. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects , 2009, SIGIR.
[37] Katrin Kirchhoff,et al. Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP , 2007, NAACL.
[38] Luca Maria Gambardella,et al. Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.
[39] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[40] O. Krestinskaya,et al. Memristive GAN in Analog , 2020, Scientific Reports.
[41] Luca Benini,et al. ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator , 2020, 2021 IEEE International Symposium on Circuits and Systems (ISCAS).
[42] Tinoosh Mohsenin,et al. BiNMAC: Binarized neural Network Manycore ACcelerator , 2018, ACM Great Lakes Symposium on VLSI.
[43] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Satu Elisa Schaeffer,et al. Survey Graph clustering , 2007 .
[45] Yann LeCun,et al. Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).
[46] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[47] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[48] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Zhongyuan Yu,et al. Infrared Plasmonic Refractive Index Sensor with Ultra-High Figure of Merit Based on the Optimized All-Metal Grating , 2017, Nanoscale Research Letters.
[50] Meikang Qiu,et al. Security-aware optimization for ubiquitous computing systems with SEAT graph approach , 2013, J. Comput. Syst. Sci..
[51] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[52] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[53] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[54] Qing Wu,et al. Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.
[55] William J. Dally,et al. Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.
[56] Eunhyeok Park,et al. Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] D. McAlpine,et al. Hidden hearing loss selectively impairs neural adaptation to loud sound environments , 2018, Nature Communications.
[58] K. Pingali,et al. Pangolin , 2019, Proc. VLDB Endow..
[59] Rajeev Balasubramonian,et al. Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration , 2018, IEEE Micro.
[60] Xuehai Qian,et al. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[61] Satu Elisa Schaeffer,et al. Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.
[62] A. M. Stankovic,et al. Graph oriented algorithm for the steady-state security enhancement in distribution networks , 1989 .
[63] Dan Williams,et al. Platform Storage Performance With 3D XPoint Technology , 2017, Proceedings of the IEEE.
[64] Duane Mills,et al. 19.7 A 16Gb ReRAM with 200MB/s write and 1GB/s read in 27nm technology , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[65] J. Demmel,et al. Solving Sparse Linear Systems with Sparse Backward Error , 2015 .
[66] Runze Han,et al. Demonstration of Logic Operations in High-Performance RRAM Crossbar Array Fabricated by Atomic Layer Deposition Technique , 2017, Nanoscale Research Letters.
[67] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[68] Hal Daumé,et al. Fast Large-Scale Approximate Graph Construction for NLP , 2012, EMNLP.
[69] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[70] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[71] Mehrzad Samadi,et al. Memory-centric system interconnect design with hybrid memory cubes , 2013, PACT 2013.
[72] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[73] Chao Di,et al. U1 snRNP regulates cancer cell migration and invasion in vitro , 2020, Nature Communications.
[74] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.
[75] Naren Ramakrishnan,et al. Studying Recommendation Algorithms by Graph Analysis , 2003, Journal of Intelligent Information Systems.
[76] Qing Wu,et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks , 2018, Nature Communications.
[77] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[78] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[79] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[80] Yangqing Jia,et al. Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.
[81] M. Mitchell Waldrop,et al. The chips are down for Moore’s law , 2016, Nature.
[82] Hao Jiang,et al. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[83] Wei Niu. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020 .
[84] Tao Zhang,et al. Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[85] Yiran Chen,et al. ReBNN: in-situ acceleration of binarized neural networks in ReRAM using complementary resistive cell , 2019, CCF Transactions on High Performance Computing.
[86] Catherine Graves,et al. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[87] Yu Wang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[88] Yanzhi Wang,et al. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.
[89] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[90] Zhengya Zhang,et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations , 2019, Nature Electronics.
[91] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..
[92] Anton J. Enright,et al. BioLayout-an automatic graph layout algorithm for similarity visualization , 2001, Bioinform..
[93] F. Jensen. Introduction to Computational Chemistry , 1998 .
[94] Jaejin Lee,et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[95] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[96] Dmitri B. Strukov,et al. Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits , 2017, Nature Communications.
[97] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[98] François Fouss,et al. Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.
[99] A. Thomas,et al. Memristor-based neural networks , 2013 .
[100] Keval Vora,et al. Peregrine: a pattern-aware graph mining system , 2020, EuroSys.
[101] Vijayalakshmi Srinivasan,et al. Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[102] Luca Benini,et al. XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[103] Huaping Zhao,et al. Nanoelectrode design from microminiaturized honeycomb monolith with ultrathin and stiff nanoscaffold for high-energy micro-supercapacitors , 2020, Nature Communications.
[104] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[105] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[106] Wenguang Chen,et al. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.
[107] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[108] Benno Schwikowski,et al. Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..
[109] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[110] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[111] Kinam Kim,et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta2O(5-x)/TaO(2-x) bilayer structures. , 2011, Nature materials.
[112] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[113] Dmitri B. Strukov,et al. Towards the Development of Analog Neuromorphic Chip Prototype with 2.4M Integrated Memristors , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).
[114] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[115] Kai Wang,et al. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.
[116] Michael Ferdman,et al. Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[117] Mohammed J. Zaki,et al. Arabesque: a system for distributed graph mining , 2015, SOSP.
[118] Arie E. Kaufman,et al. GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[119] Jung-Hwan Moon,et al. A self-rectifying TaOy/nanoporous TaOx memristor synaptic array for learning and energy-efficient neuromorphic systems , 2018, NPG Asia Materials.
[120] Meng-Fan Chang,et al. 17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[121] Yiran Chen,et al. GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[122] Masahide Matsumoto,et al. A 130.7-$\hbox{mm}^{2}$ 2-Layer 32-Gb ReRAM Memory Device in 24-nm Technology , 2014, IEEE Journal of Solid-State Circuits.
[123] Yanzhi Wang,et al. GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.
[124] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[125] Luca Benini,et al. XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks , 2018, 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS).
[126] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[127] Bo Hong,et al. Neural signal analysis with memristor arrays towards high-efficiency brain–machine interfaces , 2020, Nature Communications.
[128] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[129] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[130] Yu Wang,et al. MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[131] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[132] Dmitri B. Strukov,et al. 3D ReRAM arrays and crossbars: Fabrication, characterization and applications , 2017, 2017 IEEE 17th International Conference on Nanotechnology (IEEE-NANO).
[133] P. Harrison. Quantum wells, wires, and dots : theoretical and computational physics , 2016 .
[134] Jiayu Li,et al. ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.
[135] William M. Campbell,et al. Social Network Analysis with Content and Graphs , 2013 .
[136] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[137] Depei Qian,et al. SympleGraph: distributed graph processing with precise loop-carried dependency guarantee , 2020, PLDI.
[138] MutluOnur,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015 .
[139] Asit K. Mishra,et al. From High-Level Deep Network Models to FPGA Acceleration , 2016 .