Graph convolutional networks (GCNs) have revolutionized many big data applications. However, accelerating GCN inference is still challenging due to (1) massive external memory traffic and irregular memory access, (2) workload imbalance because of the skewed degree distribution, and (3) intra-stage load imbalance between feature aggregation and feature transformation steps. To address the above challenges, we propose a framework to optimize GCN inference on FPGA. First, we propose a novel Partition-Centric Feature Aggregation (PCFA) scheme to increase the data locality and reduce the number of random memory accesses in feature aggregation step. Second, we propose a novel hardware architecture to enable pipelined execution of the two heterogeneous computation steps. Then, a low-overhead task scheduling strategy is proposed to achieve stall-free execution of the two computation steps. Third, we provide a complete GCN acceleration framework on FPGA, and define key parameters for users to fine-tune the throughput. The model-specific operators can be customized to support a wide-range of GCN models. Using our framework, we design accelerators on a state-of-the-art FPGA. We evaluate our work using widely used datasets and. Experimental results show the accelerators produced by our framework achieve significant speedup compared with state-of-the-art implementations on CPU (≈100x), GPU (≈30x), and FPGA (4.5-32x).