Ultrasound Tongue Contour Extraction using Dilated Convolutional Neural Network

One application of medical ultrasound imaging is to visualize and characterize human tongue shape and motion to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it requires knowledge about the tongue structure and ultrasound data interpretation for users to recognize tongue gestures. Moreover, quantitative analysis of tongue motion needs the tongue contour to be extracted, tracked and visualized automatically. This paper presents two novel deep neural networks that benefit from the ability of global prediction of encoding-decoding fully convolutional networks and the capability of full-resolution extraction of dilated convolutions. Assessment studies over datasets from different ultrasound machines disclosed the outstanding performances of the proposed models in terms of accuracy and robustness.

[1]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  James M Scobbie,et al.  Seeing Speech: an articulatory web resource for the study of phonetics , 2015 .

[3]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  M. Hamed Mozaffari,et al.  Guided Learning of Pronunciation by Visualizing Tongue Articulation in Ultrasound Image Sequences , 2018, 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA).

[5]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[7]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Elham Karimi,et al.  Fully-automated tongue detection in ultrasound images , 2019, Comput. Biol. Medicine.

[9]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[10]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jian Zhu,et al.  A CNN-based tool for automatic tongue contour tracking in ultrasound images , 2019, ArXiv.

[13]  Kele Xu,et al.  A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization. , 2016, The Journal of the Acoustical Society of America.

[14]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[15]  Keisuke Nemoto,et al.  Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Ian R. Fasel,et al.  Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[18]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[19]  Pierre Roussel-Ragot,et al.  Tongue contour extraction from ultrasound images based on deep neural network , 2015, ICPhS.

[20]  Lucie Ménard,et al.  Multi‐hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech , 2018, Medical Image Anal..

[21]  M. Stone A guide to analysing tongue motion from ultrasound images , 2005, Clinical linguistics & phonetics.

[22]  M. Hamed Mozaffari,et al.  BowNet: Dilated Convolution Neural Network for Ultrasound Tongue Contour Extraction , 2019, The Journal of the Acoustical Society of America.

[23]  Nan Wang,et al.  Real-time Automatic Tongue Contour Tracking in Ultrasound Video for Guided Pronunciation Training , 2019, VISIGRAPP.

[24]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[25]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.