When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization