The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models