We present TAICHI, a general in-memory computing deep neural network accelerator design based on RRAM crossbar arrays heterogeneously integrated with local arithmetic units and global co-processors to allow the system to efficiently map different models while maintaining high energy efficiency and throughput. A hierarchical mesh network-on-chip is implemented to facilitate communication among clusters in TAICHI to balance reconfigurability and efficiency. Detailed deployment of the different circuit components is discussed, and the system performance is estimated at several technology nodes. The heterogeneous design also allows the system to accommodate models larger than the on-chip storage capability.