Accelerating CNN Training by Pruning Activation Gradients