论文信息 - TensorFlow: A system for large-scale machine learning

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[2] Butler W. Lampson,et al. Annual review of computer science vol. 1, 1986 , 1986 .

[3] David E. Culler,et al. Dataflow architectures , 1986 .

[4] Geoffrey E. Hinton. Learning distributed representations of concepts. , 1989 .

[5] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[6] D. Signorini,et al. Neural networks , 1995, The Lancet.

[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[8] Michael I. Jordan. Serial Order: A Parallel Distributed Processing Approach , 1997 .

[9] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[10] Samy Bengio,et al. Torch: a modular machine learning software library , 2002 .

[11] Ronald,et al. Learning representations by backpropagating errors , 2004 .

[12] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[14] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[15] Michael Burrows,et al. The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[16] Brett D. Fleisch,et al. The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[17] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[18] Michael Isard,et al. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[19] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[20] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[21] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[22] Mahadev Konar,et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[23] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[24] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[25] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[26] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[27] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[30] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[32] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[35] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.

[36] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[38] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[39] Eric S. Chung,et al. LINQits: big data on little clients , 2013, ISCA.

[40] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[41] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[42] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[43] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[44] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[45] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[46] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[48] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[49] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.

[50] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[51] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[52] Quoc V. Le,et al. Document Embedding with Paragraph Vectors , 2015, ArXiv.

[53] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[54] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[55] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[56] Joo-Young Kim,et al. Toward Accelerating Deep Learning at Scale Using Specialized Logic , 2015 .

[57] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[58] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[59] Joaquín González-Rodríguez,et al. Frame-by-frame language identification in short utterances using deep neural networks , 2015, Neural Networks.

[60] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[61] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[62] Anelia Angelova,et al. Pedestrian detection with a Large-Field-Of-View deep network , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[63] Michael I. Jordan,et al. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.

[64] Michael Isard,et al. Scalability! But at what COST? , 2015, HotOS.

[65] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[66] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[67] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[71] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[73] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[74] Martín Abadi,et al. Incremental, iterative data processing with timely dataflow , 2016, Commun. ACM.

[75] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[76] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[77] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[78] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[79] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.