Personalizing Recurrent-Neural-Network-Based Language Model by Social Network

With the popularity of mobile devices, personalized speech recognizers have become more attainable and are highly attractive. Since each mobile device is used primarily by a single user, it is possible to have a personalized recognizer that well matches the characteristics of the individual user. Although acoustic model personalization has been investigated for decades, much less work has been reported on personalizing language models, presumably because of the difficulties in collecting sufficient personalized corpora. In this paper, we propose a general framework for personalizing recurrent-neural-network-based language models (RNNLMs) using data collected from social networks, including the posts of many individual users and friend relationships among the users. Two major directions for this are model-based and feature-based RNNLM personalization. In model-based RNNLM personalization, the RNNLM parameters are fine-tuned to an individual user's wording patterns by incorporating social texts posted by the target user and his or her friends. For the feature-based approach, the RNNLM model parameters are fixed across users, but the RNNLM input features are instead augmented with personalized information. Both approaches not only drastically reduce the model perplexity, but also moderately reduce word error rates in $n$ -best rescoring tests.

[1]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[2]  James R. Glass,et al.  A Conversational Movie Search System Based on Conditional Random Fields , 2012, INTERSPEECH.

[3]  James R. Glass,et al.  Automating Crowd-supervised Learning for Spoken Language Systems , 2012, INTERSPEECH.

[4]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Alexandre Allauzen,et al.  Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Victor Kuperman,et al.  Crowdsourcing and language studies: the new generation of linguistic data , 2010, Mturk@HLT-NAACL.

[8]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[9]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Alexandre Allauzen,et al.  Measuring the Influence of Long Range Dependencies with Neural Network Language Models , 2012, WLM@NAACL-HLT.

[11]  Lin-Shan Lee,et al.  Personalizing universal recurrent neural network language model with user characteristic features by social network crowdsourcing , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[12]  Jun Wu,et al.  Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling , 2000, Comput. Speech Lang..

[13]  Hermann Ney,et al.  Performance analysis of Neural Networks in combination with n-gram language models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[16]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Dilek Z. Hakkani-Tür,et al.  Research Challenges and Opportunities in Mobile Applications , 2011 .

[19]  Lin-Shan Lee,et al.  Robust topic inference for latent semantic language model adaptation , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[20]  Yoon Ho Cho,et al.  A personalized recommender system based on web usage mining and decision tree induction , 2002, Expert Syst. Appl..

[21]  Qiang Yang,et al.  User language model for collaborative personalized search , 2009, TOIS.

[22]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[24]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[25]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[26]  Hao Tang,et al.  Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[27]  Frankie James,et al.  Modified Kneser-Ney Smoothing of n-gram Models , 2000 .

[28]  Lin-Shan Lee,et al.  A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Gökhan Tür,et al.  Research Challenges and Opportunities in Mobile Applications [DSP Education] , 2011, IEEE Signal Processing Magazine.

[30]  Jun Wu,et al.  Combining nonlocal, syntactic and n-gram dependencies in language modeling , 1999, EUROSPEECH.

[31]  Yu-Yang Huang,et al.  Enriching Cold Start Personalized Language Model Using Social Network Information , 2014, ACL.

[32]  Stefano Battiston,et al.  A model of a trust-based recommendation system on a social network , 2006, Autonomous Agents and Multi-Agent Systems.

[33]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[34]  Hung-An Chang,et al.  Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm , 2007, INTERSPEECH.

[35]  James R. Glass,et al.  Style & Topic Language Model Adaptation Using HMM-LDA , 2006, EMNLP.

[36]  Lin-Shan Lee,et al.  Personalized language modeling by crowd sourcing with social network data for voice access of cloud applications , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[37]  Susan Gauch,et al.  Personalizing Search Based on User Search Histories , 2004 .

[38]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[39]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[40]  Themos Stafylakis,et al.  I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Tanja Schultz,et al.  Correlated Latent Semantic Model for Unsupervised LM Adaptation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[42]  Gabriella Pasi,et al.  A Language Modeling Approach to Personalized Search Based on Users' Microblog Behavior , 2014, ECIR.

[43]  Yongqiang Wang,et al.  Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[44]  Yongqiang Wang,et al.  Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45]  Yu Tsao,et al.  Recurrent neural network based language model personalization by social network crowdsourcing , 2013, INTERSPEECH.

[46]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[47]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[49]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[50]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[51]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[52]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[53]  Sung-Bae Cho,et al.  Location-Based Recommendation System Using Bayesian User's Preference Model in Mobile Devices , 2007, UIC.

[54]  Tanja Schultz,et al.  Unsupervised language model adaptation using latent semantic marginals , 2006, INTERSPEECH.

[55]  Mark J. F. Gales,et al.  Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation , 2011, INTERSPEECH.

[56]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[57]  Yangyang Shi,et al.  Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features , 2012, INTERSPEECH.

[58]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[59]  Lin-Shan Lee,et al.  Recognition of highly imbalanced code-mixed bilingual speech with frame-level language detection based on blurred posteriorgram , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[61]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.