Semantic Communication Systems for Speech Transmission

Semantic communications could improve the transmission efficiency significantly by exploring the semantic information. In this paper, we make an effort to recover the transmitted speech signals in the semantic communication systems, which minimizes the error at the semantic level rather than the bit or symbol level. Particularly, we design a deep learning (DL)-enabled semantic communication system for speech signals, named DeepSC-S. In order to improve the recovery accuracy of speech signals, especially for the essential information, DeepSC-S is developed based on an attention mechanism by utilizing a squeeze-and-excitation (SE) network. The motivation behind the attention mechanism is to identify the essential speech information by providing higher weights to them when training the neural network. Moreover, in order to facilitate the proposed DeepSC-S for dynamic channel environments, we find a general model to cope with various channel conditions without retraining. Furthermore, we investigate DeepSC-S in telephone systems as well as multimedia transmission systems to verify the model adaptation in practice. The simulation results demonstrate that our proposed DeepSC-S outperforms the traditional communications in both cases in terms of the speech signals metrics, such as signal-to-distortion ration and perceptual evaluation of speech distortion. Besides, DeepSC-S is more robust to channel variations, especially in the low signal-to-noise (SNR) regime.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[3]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Shi Jin,et al.  Model-Driven Deep Learning for MIMO Detection , 2020, IEEE Transactions on Signal Processing.

[5]  The Age of Incorrect Information: an Enabler of Semantics-Empowered Communication , 2020, ArXiv.

[6]  Ananthram Swami,et al.  The semantic communication game , 2016, 2016 IEEE International Conference on Communications (ICC).

[7]  Geoffrey Ye Li,et al.  Deep Learning Enabled Semantic Communication Systems , 2020, IEEE Transactions on Signal Processing.

[8]  Yufei Wu,et al.  The influence of quantization and fixed point arithmetic upon the BER performance of turbo codes , 1999, 1999 IEEE 49th Vehicular Technology Conference (Cat. No.99CH36363).

[9]  James A. Hendler,et al.  Preserving quality of information by using semantic relationships , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[10]  Geoffrey Ye Li,et al.  Federated Learning and Wireless Communications , 2020, IEEE Wireless Communications.

[11]  Deniz Gündüz,et al.  Joint Device-Edge Inference over Wireless Links with Pruning , 2020, 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[12]  Jakob Hoydis,et al.  An Introduction to Deep Learning for the Physical Layer , 2017, IEEE Transactions on Cognitive Communications and Networking.

[13]  Zhijin Qin,et al.  Semantic Communications for Speech Signals , 2020, ICC 2021 - IEEE International Conference on Communications.

[14]  Stephan ten Brink,et al.  OFDM-Autoencoder for End-to-End Learning of Communications Systems , 2018, 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[15]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[16]  Deniz Gündüz,et al.  Deep Joint Source-channel Coding for Wireless Image Transmission , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  James A. Hendler,et al.  Towards a theory of semantic communication , 2011, 2011 IEEE Network Science Workshop.

[18]  R. V. Cox,et al.  Three new speech coders from the ITU cover a range of applications , 1997, IEEE Commun. Mag..

[19]  Geoffrey Ye Li,et al.  Deep-Learning-Based Wireless Resource Allocation With Application to Vehicular Networks , 2019, Proceedings of the IEEE.

[20]  Rudolf Carnap,et al.  An outline of a theory of semantic information , 1952 .

[21]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Deniz Gündüz,et al.  Deep Joint Source-Channel Coding for Wireless Image Transmission , 2019, IEEE Transactions on Cognitive Communications and Networking.

[24]  Stephan ten Brink,et al.  On deep learning-based channel decoding , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[25]  Biing-Hwang Juang,et al.  Deep Learning in Physical Layer Communications , 2018, IEEE Wireless Communications.

[26]  Deniz Gündüz,et al.  Wireless Image Retrieval at the Edge , 2020, IEEE Journal on Selected Areas in Communications.

[27]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[28]  Woongsup Lee,et al.  A Novel PAPR Reduction Scheme for OFDM System Based on Deep Learning , 2018, IEEE Communications Letters.

[29]  Deniz Gündüz,et al.  DeepJSCC-f: Deep Joint Source-Channel Coding of Images With Feedback , 2020, IEEE Journal on Selected Areas in Information Theory.

[30]  Jakob Hoydis,et al.  Model-Free Training of End-to-End Communication Systems , 2018, IEEE Journal on Selected Areas in Communications.

[31]  Biing-Hwang Juang,et al.  Deep Learning-Based End-to-End Wireless Communication Systems With Conditional GANs as Unknown Channels , 2019, IEEE Transactions on Wireless Communications.

[32]  Yu-Chieh Chang,et al.  Deep Learning-Constructed Joint Transmission-Recognition for Internet of Things , 2019, IEEE Access.

[33]  Joonhyuk Kang,et al.  Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Timothy J. O'Shea,et al.  Physical layer deep learning of encodings for the MIMO fading channel , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[35]  Geoffrey Ye Li,et al.  Deep Learning based End-to-End Wireless Communication Systems with Conditional GAN as Unknown Channel , 2019 .

[36]  N. Sidiropoulos,et al.  Learning to Optimize: Training Deep Neural Networks for Interference Management , 2017, IEEE Transactions on Signal Processing.

[37]  Zhijin Qin,et al.  A Lite Distributed Semantic Communication System for Internet of Things , 2021, IEEE Journal on Selected Areas in Communications.

[38]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[39]  Geoffrey Ye Li,et al.  Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems , 2017, IEEE Wireless Communications Letters.

[40]  Ami Wiesel,et al.  Deep MIMO detection , 2017, 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[41]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .