Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
暂无分享,去创建一个
Sreeram Kannan | Pramod Viswanath | Sewoong Oh | Ashok Vardhan Makkuva | Sewoong Oh | P. Viswanath | Sreeram Kannan | A. Makkuva
[1] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.
[2] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[3] Trevor Darrell,et al. Deep Mixture of Experts via Shallow Embedding , 2018, UAI.
[4] Anima Anandkumar,et al. Provable Tensor Methods for Learning Mixtures of Classifiers , 2014, ArXiv.
[5] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[6] J. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .
[7] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[8] Vatsal Sharan,et al. Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use , 2017, ICML.
[9] Martin J. Wainwright,et al. Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.
[10] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[11] Constantine Caramanis,et al. Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization , 2016, ArXiv.
[12] Yi-Cheng Liu,et al. Using mixture design and neural networks to build stock selection decision support systems , 2017, Neural Computing and Applications.
[13] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[14] Reza Ebrahimpour,et al. Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.
[15] Prateek Jain,et al. Learning Mixtures of Discrete Product Distributions using Spectral Decompositions , 2013, COLT.
[16] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[17] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.
[18] Arian Maleki,et al. Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.
[19] Tengyu Ma,et al. Polynomial-Time Tensor Decompositions with Sum-of-Squares , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).
[20] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[21] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .
[22] Constantine Caramanis,et al. A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[25] Michael I. Jordan,et al. Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.
[26] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .
[27] Inderjit S. Dhillon,et al. Mixed Linear Regression with Multiple Components , 2016, NIPS.
[28] I-Cheng Yeh,et al. Modeling of strength of high-performance concrete using artificial neural networks , 1998 .
[29] Christos Tzamos,et al. Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.
[30] Volker Tresp,et al. Mixtures of Gaussian Processes , 2000, NIPS.
[31] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[32] Joseph N. Wilson,et al. Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.
[33] Marc Peter Deisenroth,et al. Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression , 2014, ArXiv.
[34] Anima Anandkumar,et al. Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.
[35] Xiao Sun,et al. Human-Machine Conversation Based on Hybrid Neural Network , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).
[36] Marc'Aurelio Ranzato,et al. Hard Mixtures of Experts for Large Scale Weakly Supervised Vision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Anima Anandkumar,et al. Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.
[38] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[39] Thomas F. Brooks,et al. Airfoil self-noise and prediction , 1989 .
[40] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[41] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[42] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..
[43] Samy Bengio,et al. A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.
[44] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[45] Anima Anandkumar,et al. A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.
[46] Percy Liang,et al. Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.
[47] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[48] Jean-Michel Renders,et al. LSTM-Based Mixture-of-Experts for Knowledge-Aware Dialogues , 2016, Rep4NLP@ACL.
[49] Stratis Ioannidis,et al. Learning Mixtures of Linear Classifiers , 2014, ICML.
[50] A. Appendix. Alternating Minimization for Mixed Linear Regression , 2014 .
[51] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.