Identifying and Mitigating Flaws of Deep Perceptual Similarity Metrics