Training Deep Neural Networks with Partially Adaptive Momentum