A submodular optimization approach to sentence set selection

A new method for selecting a sentence set with a desired phoneme distribution is presented. Selection of a sentence set for speech corpus recording is a fundamental step in speech processing research. The problem of designing phonetically-balanced sentence sets has been studied extensively in the past. One of the popular approaches is to select a sentence set so that its phoneme distribution gets close to a given (desired) distribution. Several methods have been proposed in the literature to realize this approach. However, these methods were designed by heuristics, which means they are not optimal. In this paper, we propose a near-optimal method for selecting sentence sets along this approach. We first define our objective function, and show it to be a submodular function. Then, we show that a greedy algorithm is near-optimal for this problem, according to the submodular optimization theory. We also show that a significant speedup is possible by exploiting the submodularity of the objective function. Our experimental result on Japanese phonetically-balanced sentence set selection shows the effectiveness of the proposed method.

[1]  Andreas Krause,et al.  Optimizing Sensing: From Water to the Web , 2009, Computer.

[2]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[3]  Rong Zhang,et al.  Data selection for speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4]  Hui Lin,et al.  Optimal Selection of Limited Vocabulary Speech Corpora , 2011, INTERSPEECH.

[5]  Deniz Erdogmus,et al.  A comparison of different dimensionality reduction and feature selection methods for single trial ERP detection , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[6]  Jan P. H. van Santen,et al.  Methods for optimal text selection , 1997, EUROSPEECH.

[7]  Bhuvana Ramabhadran,et al.  An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[9]  Hui Lin,et al.  How to select a good training-data subset for transcription: submodular active selection for sequences , 2009, INTERSPEECH.

[10]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[11]  Olivier Siohan,et al.  Ivector-based Acoustic Data Selection , 2013, INTERSPEECH.

[12]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[13]  Abeer Alwan,et al.  Efficient adaptation text design based on the Kullback-Leibler measure , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Olivier Boëffard,et al.  Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem , 2001, INTERSPEECH.

[15]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[16]  Ren-Yuan Lyu,et al.  Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition , 1999, Comput. Speech Lang..

[17]  Marelie H. Davel,et al.  Kullback-Leibler Divergence-Based ASR Training Data Selection , 2011, INTERSPEECH.