A Checkpointing Strategy for Scalable Recovery on Distributed Parallel Systems