Modern Artificial Intelligence (AI) systems deploy Convolution Neural Networks (CNN) as they offer very high accuracy. Computational complexity of CNNs necessitates hardware acceleration, especially in mobile phones and other hand held devices due to stringent power and area budget. In this paper, we propose a systolic dataflow based accelerator architecture with the aim of improving energy efficiency as well as scalability. We demonstrate that a prototype of our proposed accelerator comprising 4096 Multiply-Accumulate (MAC) units is capable of achieving 8.89–9.54 TOPs/W power efficiency when executing a number of state of the art CNN models. We also observe that the throughput of the accelerator scales almost linearly with power dissipated. This is attributed to low power consumption due to minimal non-compute overhead. Our proposed accelerator outperforms state of the art accelerators power efficiency by factors of 2.6 to 3.8× while executing various layers of Inception V3 model.