FLASH: Towards a High-performance Hardware Acceleration Architecture for Cross-silo Federated Learning