Training Data Size Requirements for Topic Classification in a Speech-Oriented Guidance System

In this work, we address the classification in topics of utterances in Japanese received by a speech-oriented guidance system operating in a real environment. The implementation of this kind of systems requires the collection and manual labeling of actual user’s utterances, which is a costly process. Because of this, we are interested in evaluating the influence of the amount of data for training in the context of topic classification. For this, we compared the performance of a Support Vector Machine and a Maximum Entropy classifier using training data of different sizes. We used actual data collected by the speech-oriented guidance system Takemaru-kun, from adults and children, and also evaluated the effect of automatic speech recognition (ASR) errors in the classification performance. To deal with the shortness of the utterances we proposed to use characters as features, which is possible with the Japanese language due to the presence of kanji; ideograms from Chinese characters that represent not only sound but meaning. Experimental results show an average performance decrease of 4.6% for ASR results of utterances from adults, and 2.8% for children, when reducing the amount of data for training to its 25%; and a classification performance improvement from 92.2% to 94.1% for adults and 87.2% to 88.3% for children, when using character as features instead of words.

[1]  Kiyohiro Shikano,et al.  Public speech-oriented guidance system with adult and child discrimination capability , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jun'ichi Tsujii,et al.  Evaluation and Extension of Maximum Entropy Models with Inequality Constraints , 2003, EMNLP.

[3]  Tatsuya Kawahara,et al.  Speech-Based Interactive Information Guidance System using Question-Answering Technique , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Satoshi Nakamura,et al.  Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  David Suendermann-Oeft,et al.  Call classification for automated troubleshooting on large corpora , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[8]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[9]  Youngja Park,et al.  Low-cost call type classification for contact center calls using partial transcripts , 2009, INTERSPEECH.

[10]  Gökhan Tür,et al.  The AT&T spoken language understanding system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.