Towards a Categorization of Natural Language Variability in Data for Spoken Dialog Systems