The semi-supervised switchboard transcription project

In previous work, we proposed a new graph-based semisupervised learning (SSL) algorithm and showed that it outperforms other state-of-the-art SSL approaches for classifying documents and web-pages. Here we use a multi-threaded implementation in order to scale the algorithm to very large data sets. We treat the phonetically annotated portion of the Switchboard transcription project (STP) as labeled data and automatically annotate (at the phonetic level) the Switchboard I (SWB) training set and show that our proposed approach outperforms stateof-the-art SSL algorithms as well as a state-of-the-art strictly supervised classifier. As a result, we have STP-style annotations of the entire SWB-I training set which we refer to as semisupervised STP (S3TP).

[1]  Joseph Picone,et al.  Resegmentation of SWITCHBOARD , 1998, ICSLP.

[2]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[3]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[4]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[5]  Nurul I Sarkar,et al.  Keynote speech I , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.

[6]  Mark J. F. Gales,et al.  Training LVCSR systems on thousands of hours of data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[8]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[9]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[10]  Ivor W. Tsang,et al.  Large-Scale Sparsified Manifold Regularization , 2006, NIPS.

[11]  V. Rich Personal communication , 1989, Nature.

[12]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[13]  Katrin Kirchhoff,et al.  Graph-based learning for phonetic classification , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[14]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Aren Jansen,et al.  Semi-supervised learning of speech sounds , 2007, INTERSPEECH.

[17]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[18]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[19]  Vikas Sindhwani,et al.  On Manifold Regularization , 2005, AISTATS.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[22]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[23]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[24]  Jeff A. Bilmes,et al.  Uncertainty in training large vocabulary speech recognizers , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[25]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[26]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.