On the Use of Linguistic Features in an Automatic System for Speech Analytics of Telephone Conversations

A research on the analysis of human/human conversations in a call centre is described. The purpose of the research is to provide short reports of each conversation with information useful for monitoring the call centre efficiency. Data from real users discussing over the telephone with agents are processed by an automatic speech recognition (ASR) system. Reports are grouped into classes by the agents based on predefined taxonomy. A train set of manually transcribed data is used for training the extraction of features relevant to the application and the classification of the conversations. The use of all the words of the application vocabulary, of automatically selected key_words, and of automatically learned sentence chunks containing semantic classes of words are compared and evaluated with a totally different test set. The results show a significant increase in performance when chunks are used even in comparison with the use of bags of words obtained with a boosting algorithm.