暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.
[2] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[3] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[4] Jen-Tzung Chien,et al. Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances , 2012, IEEE Signal Processing Magazine.
[5] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[6] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[7] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[8] Dong Yu,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks , 2012, INTERSPEECH.
[9] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .
[10] Miguel Á. Carreira-Perpiñán,et al. Optimizing affinity-based binary hashing using auxiliary coordinates , 2016, NIPS.
[11] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[12] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.
[13] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.
[14] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[15] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[16] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[17] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[18] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..
[19] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[20] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[21] Ameet Talwalkar,et al. Large-scale SVD and manifold learning , 2013, J. Mach. Learn. Res..
[22] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.
[23] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[24] Miguel Á. Carreira-Perpiñán,et al. Joint optimization of mapping and classifier using auxiliary coordinates , 2014 .
[25] Miguel Á. Carreira-Perpiñán,et al. The Variational Nystrom method for large-scale spectral problems , 2016, ICML.
[26] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.
[28] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[29] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..
[30] Matthijs Douze,et al. Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[32] Adolfy Hoisie,et al. Performance Optimization of Numerically Intensive Codes , 1987 .
[33] D. Howard,et al. Speech and audio signal processing: processing and perception of speech and music [Book Review] , 2000 .
[34] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[36] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[37] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[38] A Orman,et al. Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..
[39] Ameet Talwalkar,et al. Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[40] Volkan Cevher,et al. Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics , 2014, IEEE Signal Processing Magazine.
[41] Miguel Á. Carreira-Perpiñán,et al. Entropic Affinities: Properties and Efficient Numerical Computation , 2013, ICML.
[42] Samuel H. Fuller,et al. The Future of Computing Performance: Game Over or Next Level? , 2014 .
[43] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[44] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[45] Nicolas Le Roux,et al. Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.
[46] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[47] Marc Snir,et al. GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .
[48] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.
[49] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[50] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[51] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[52] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[53] Miguel Á. Carreira-Perpiñán,et al. Hashing with binary autoencoders , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[55] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[56] Miguel Á. Carreira-Perpiñán,et al. A fast, universal algorithm to learn parametric nonlinear embeddings , 2015, NIPS.
[57] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.
[58] Kristen Grauman,et al. Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.
[59] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[60] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..
[61] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[62] Alexander G. Gray,et al. Stochastic Alternating Direction Method of Multipliers , 2013, ICML.
[63] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[64] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[65] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[66] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.