Lip Image Segmentation Based on a Fuzzy Convolutional Neural Network

Research has shown that the human lip and its movements are a rich source of information related to speech content and speaker's identity. Lip image segmentation, as a fundamental step in many lip-reading and visual speaker authentication systems, is of vital importance. Because of variations in lip color, lighting conditions and especially the complex appearance of an open mouth, accurate lip region segmentation is still a challenging task. To address this problem, this article proposes a new fuzzy deep neural network having an architecture that integrates fuzzy units and traditional convolutional units. The convolutional units are used to extract discriminative features at different scales to provide comprehensive information for pixel-level lip segmentation. The fuzzy logic modules are employed to handle various kinds of uncertainties and to provide a more robust segmentation result. An end-to-end training scheme is then used to learn the optimal parameters for both the fuzzy and the convolutional units. A dataset containing more than 48 000 images of various speakers, under different lighting conditions, was used to evaluate lip segmentation performance. According to the experimental results, the proposed method achieves state-of-the-art performance when compared with other algorithms.

[1]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  M. N. Giriprasad,et al.  Use of lip synchronization by hearing impaired using digital image processing for enhanced perception of speech , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[5]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[6]  Carlos Santiago,et al.  2D Segmentation Using a Robust Active Shape Model With the EM Algorithm , 2015, IEEE Transactions on Image Processing.

[7]  Franck Luthon,et al.  Lip features automatic extraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  Alan Wee-Chung Liew,et al.  Segmentation of color lip images by spatial fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[9]  Scott Dick,et al.  Classification via Deep Fuzzy c-Means Clustering , 2018, 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[10]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[11]  Xiaochun Cao,et al.  Lip Segmentation under MAP-MRF Framework with Automatic Selection of Local Observation Scale and Number of Segments , 2014, IEEE Transactions on Image Processing.

[12]  Russell M. Mersereau,et al.  Lip feature extraction towards an automatic speechreading system , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[13]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  Shu Hung Leung,et al.  Lip image segmentation using fuzzy clustering incorporating an elliptic shape function , 2004, IEEE Transactions on Image Processing.

[16]  Kah Phooi Seng,et al.  Lips Contour Detection and Tracking Using Watershed Region-Based Active Contour Model and Modified $H_{\infty}$ , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Alan Wee-Chung Liew,et al.  Lip contour extraction from color images using a deformable model , 2002, Pattern Recognit..

[18]  Xiang Lin,et al.  Lip Segmentation with Muti-scale Features Based on Fully Convolution Network , 2018, 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC).

[19]  Ye-peng Guan,et al.  Automatic extraction of lips based on multi-scale wavelet edge detection , 2008 .

[20]  Alan Wee-Chung Liew,et al.  Robust lip region segmentation for lip images with complex background , 2007, Pattern Recognit..

[21]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[22]  Hossein Ebrahimnezhad,et al.  Lip Segmentation Using Level Set Method: Fusing Landmark Edge Distance and Image Information , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[24]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  R. Boostani,et al.  Lip segmentation in color images , 2008, 2008 International Conference on Innovations in Information Technology.

[26]  Xin Liu,et al.  Learning Multi-Boosted HMMs for Lip-Password Based Speaker Verification , 2014, IEEE Transactions on Information Forensics and Security.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Roya Amjadifard,et al.  Lip Segmentation Using Geometrical Model of Color Distribution , 2011, 2011 7th Iranian Conference on Machine Vision and Image Processing.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Lin Xiang,et al.  Robust Lip Region Segmentation Based on Competitive FCM Clustering , 2016 .

[31]  Youyong Kong,et al.  A Hierarchical Fused Fuzzy Deep Neural Network for Data Classification , 2017, IEEE Transactions on Fuzzy Systems.

[32]  Chia-Feng Juang,et al.  A Self-Evolving Interval Type-2 Fuzzy Neural Network With Online Structure and Parameter Learning , 2008, IEEE Transactions on Fuzzy Systems.

[33]  Yu-Jun Zheng,et al.  Airline Passenger Profiling Based on Fuzzy Deep Machine Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Alan Wee-Chung Liew,et al.  An Automatic Lipreading System for Spoken Digits With Limited Training Data , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Chi-Ho Chan,et al.  Local Ordinal Contrast Pattern histograms for spatiotemporal, lip-based speaker authentication , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).