Relative Fisher Information and Natural Gradient for Learning Large Modular Models
暂无分享,去创建一个
Frank Nielsen | Ke Sun | Ke Sun | F. Nielsen
[1] M. Fréchet. Sur l'extension de certaines evaluations statistiques au cas de petits echantillons , 1943 .
[2] H. Cramér. Mathematical methods of statistics , 1947 .
[3] V. S. Huzurbazar. Probability distributions and orthogonal parameters , 1950, Mathematical Proceedings of the Cambridge Philosophical Society.
[4] B. Efron,et al. Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .
[5] L. Cobb,et al. Estimation and Moment Recursion Relations for Multimodal Distributions of the Exponential Family , 1983 .
[6] D. Cox,et al. Parameter Orthogonality and Approximate Conditional Inference , 1987 .
[7] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[8] Takio Kurita,et al. Iterative weighted least squares algorithms for neural networks classifiers , 1992, New Generation Computing.
[9] Shun-ichi Amari,et al. Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.
[10] J. Jost. Riemannian geometry and geometric analysis , 1995 .
[11] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.
[12] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[13] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[14] N. Čencov. Statistical Decision Rules and Optimal Inference , 2000 .
[15] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[16] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[17] J. Vickers,et al. Block diagonalization of four-dimensional metrics , 2008, 0809.3327.
[18] 渡邊 澄夫. Algebraic geometry and statistical learning theory , 2009 .
[19] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[20] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[21] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.
[22] Klaus-Robert Müller,et al. Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.
[23] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[24] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.
[25] Silvere Bonnabel,et al. Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.
[26] Yann Ollivier,et al. Riemannian metrics for neural networks , 2013, ArXiv.
[27] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[28] Frank Nielsen,et al. Cramer-Rao Lower Bound and Information Geometry , 2013, ArXiv.
[29] Andrea Montanari,et al. Computational Implications of Reducing Data to Sufficient Statistics , 2014, ArXiv.
[30] Philip Thomas,et al. GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results , 2014, ICML.
[31] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[32] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[33] Ke Sun,et al. An Information Geometry of Statistical Manifold Learning , 2014, ICML.
[34] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[35] Pablo Zegers,et al. Fisher Information Properties , 2015, Entropy.
[36] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[37] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.
[38] J. Lafferty,et al. Riemannian Geometry and Statistical Machine Learning , 2015 .
[39] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[42] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[43] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Shun-ichi Amari,et al. Information Geometry and Its Applications , 2016 .
[45] Bruno Castro da Silva,et al. Energetic Natural Gradient Descent , 2016, ICML.
[46] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[47] Yann Ollivier,et al. Practical Riemannian Neural Networks , 2016, ArXiv.
[48] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[49] Peter Norvig,et al. Deep Learning with Dynamic Computation Graphs , 2017, ICLR.
[50] François-Xavier Vialard,et al. An Interpolating Distance Between Optimal Transport and Fisher–Rao Metrics , 2010, Foundations of Computational Mathematics.
[51] James G. Dowty,et al. Chentsov’s theorem for exponential families , 2017, Information Geometry.