Abstract Unstructured deep neural network (DNN) pruning have been widely studied. However, previous schemes only focused upon compressing the model’s memory footprint, which had led to relatively low reduction ratio in computational workload. This study demonstrates that the main reason behind is the inconsistent distribution of memory footprint and workload of the DNN model among different layers. Based on this observation, we propose to map the network pruning flow as a multi-objective optimization problem and design an improved genetic algorithm, which can efficiently explore the whole pruning structure space with both pruning goals equally constrained, to find the suitable solution that strikes a judicious balance between the DNN’s model size and workload. Experiments show that the proposed scheme can achieve up to 34 % further reduction on the model’s computational workload compared to the state-of-the-art pruning scheme [11, 33] for ResNet50 on the ILSVRC-2012 dataset. We have also deployed the pruned ResNet50 models on a dedicated DNN accelerator, and the measured data have shown a considerable 6 × reduction in inference time compared to FPGA accelerator implementing dense CNN model quantized in INT8 format, and a 2.27 × improvement in power efficiency over 2080Ti GPU-based implementations, respectively