Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning: An Optimization Perspective