A SegNet Based Image Enhancement Technique for Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video

In this paper, we propose a new technique for segmentation of the Air-Tissue Boundaries (ATBs) in the upper airway of the vocal tract in the midsagittal plane of the realtime Magnetic Resonance Imaging (rtMRI) videos. The proposed technique uses a segmentation using Fisher-discriminant measure (SFDM) scheme. The paper introduces an image enhancement technique using semantic segmentation in the preprocessing of the rtMRI frames before ATB prediction. We use a deep convolutional encoder-decoder architecture (SegNet) for semantic segmentation of the rtMRI images. The paper examines the significance of the preprocessing before ATB prediction by implementing the SFDM approach with different preprocessing techniques. Experiments with 5779 rtMRI video frames from four subjects demonstrate that using the semantic segmentation based image enhancement of rtMRI frames, the performance of the SFDM approach is improved compared to the other preprocessing approaches. Experiment results also show that the proposed approach yields 8.6% less error in ATB prediction compared with a semi-supervised grid based baseline segmentation approach.

[1]  Shrikanth S. Narayanan,et al.  Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI , 2016, INTERSPEECH.

[2]  Yang Wang,et al.  Extraction of tongue contour in real-time magnetic resonance imaging sequences , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Shrikanth S. Narayanan,et al.  An investigation of articulatory setting using real-time magnetic resonance imaging. , 2013, The Journal of the Acoustical Society of America.

[4]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[5]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[6]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Shrikanth S. Narayanan,et al.  Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images , 2017, INTERSPEECH.

[9]  Shrikanth S. Narayanan,et al.  Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data , 2016, INTERSPEECH.

[10]  Shrikanth S. Narayanan,et al.  Speaker verification based on the fusion of speech acoustics and inverted articulatory signals , 2016, Comput. Speech Lang..

[11]  Prasanta Kumar Ghosh,et al.  Optimal sensor placement in electromagnetic articulography recording for speech production study , 2018, Comput. Speech Lang..

[12]  Shrikanth S. Narayanan,et al.  Vocal tract cross-distance estimation from real-time MRI using region-of-interest analysis , 2013, INTERSPEECH.

[13]  J M Rubin,et al.  Pseudo-three-dimensional reconstruction of ultrasonic images of the tongue. , 1989, The Journal of the Acoustical Society of America.

[14]  Shrikanth S. Narayanan,et al.  Factor analysis of vocal-tract outlines derived from real-time magnetic resonance imaging data , 2015, ICPhS.

[15]  Shrikanth S. Narayanan,et al.  Interaction between general prosodic factors and language-specific articulatory patterns underlies divergent outcomes of coronal stop reduction , 2014 .

[16]  Engin Erzin,et al.  Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors , 2017, INTERSPEECH.

[17]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[18]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[19]  Prasanta Kumar Ghosh,et al.  A Supervised Air-Tissue Boundary Segmentation Technique in Real-Time Magnetic Resonance Imaging Video Using a Novel Measure of Contrast and Dynamic Programming , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Prasanta Kumar Ghosh,et al.  Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Shrikanth S. Narayanan,et al.  Pharyngeal constriction in English diphthong production , 2013 .

[22]  Shrikanth Narayanan,et al.  Interspeaker variability in hard palate morphology and vowel production. , 2013, Journal of speech, language, and hearing research : JSLHR.

[23]  S. Ohman Numerical model of coarticulation. , 1967, The Journal of the Acoustical Society of America.

[24]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Athanasios Katsamanis,et al.  Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis , 2010, INTERSPEECH.

[26]  Prasanta Kumar Ghosh,et al.  Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks , 2018, INTERSPEECH.

[27]  Shrikanth S. Narayanan,et al.  Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging , 2017, INTERSPEECH.

[28]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[30]  P. Schönle,et al.  Re-examination of the relation between the vocal tract and the vowel sound with electromagnetic articulography (EMA) in vocalizations , 1993 .