Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design