Comparative experiments on task classification for spoken language understanding using Naive Bayes classifier

We present a series of comparative experiments on using statistical classifiers for task classification. Our experiments focus on three aspects: the effect of different features on the performance of the classifier, the robustness of classifiers with different features on data variability and the effect of size of training data on the performance of the classifier. For Chinese input sentences, three linguistics units can be used as the features: Chinese characters, Chinese words and semantic constituents. Both advantages and disadvantages of them are analyzed in details. A controlled study using Naive Bayes classifiers is conducted to examine the impact of different features on the performance of classifiers. The classifiers with different features are evaluated respectively on the clean and noisy test data to investigate their robustness. Learning curves of the classifiers with different features are given to show the effect of size of training data.

[1]  Renato De Mori,et al.  A mixed approach to speech understanding , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Brendan J. Frey,et al.  Combination of statistical and rule-based approaches for spoken language understanding , 2002, INTERSPEECH.

[3]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.