Inference-aware convolutional neural network pruning