Tongue contour extraction from ultrasound images based on deep neural network

Studying tongue motion during speech using ultrasound is a standard procedure, but automatic ultrasound image labelling remains a challenge, as standard tongue shape extraction methods typically require human intervention. This article presents a method based on deep neural networks to automatically extract tongue contour from ultrasound images on a speech dataset. We use a deep autoencoder trained to learn the relationship between an image and its related contour, so that the model is able to automatically reconstruct contours from the ultrasound image alone. In this paper, we use an automatic labelling algorithm instead of time-consuming hand-labelling during the training process, and estimate the performances of both automatic labelling and contour extraction as compared to hand-labelling. Observed results show quality scores comparable to the state of the art.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[3]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[4]  M. Stone A guide to analysing tongue motion from ultrasound images , 2005, Clinical linguistics & phonetics.

[5]  Ian R. Fasel,et al.  Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Chandra Kambhamettu,et al.  Automatic extraction and tracking of the tongue contours , 1999, IEEE Transactions on Medical Imaging.

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Jun Cai,et al.  Vocal tract imaging system for post-laryngectomy voice replacement , 2013, 2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[9]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing , 2011 .

[10]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.