Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA