A 40-nm Low-power SRAM with Multi-stage Replica-Bitline Scheme for Reducing Timing Variation