Learning Disentangled Representation for Fine-Grained Visual Categorization