Statistical Language and Speech Processing: 8th International Conference, SLSP 2020, Cardiff, UK, October 14–16, 2020, Proceedings

Automatic speech recognition (ASR) can be deployed in a previously unknown language, in less than 24 h, given just three resources: an acoustic model trained on other languages, a set of language-model training data, and a grapheme-to-phoneme (G2P) transducer to connect them. The LanguageNet G2Ps were created with the goal of being small, fast, and easy to port to a previously unseen language. Data come from pronunciation lexicons if available, but if there are no pronunciation lexicons in the target language, then data are generated from minimal resources: from a Wikipedia description of the target language, or from a one-hour interview with a native speaker of the language. Using such methods, the LanguageNet G2Ps now include simple models in nearly 150 languages, with trained finite state transducers in 122 languages, 59 of which are sufficiently well-resourced to permit measurement of their phone error rates. This paper proposes a measure of the distance between the G2Ps in different languages, and demonstrates that agglomerative clustering of the LanguageNet languages bears some resemblance to a phylogeographic language family tree. The LanguageNet G2Ps proposed in this paper have already been applied in three cross-language ASRs, using both hybrid and end-to-end neural architectures, and further experiments are ongoing.

[1]  Masanori Morise,et al.  WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..

[2]  Mark J. F. Gales,et al.  Training and adapting MLP features for Arabic speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Ryan L. Boyd,et al.  Characterizing the Internet Research Agency’s Social Media Operations During the 2016 U.S. Presidential Election using Linguistic Analyses , 2018 .

[4]  Yannis Stylianou,et al.  Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Ahmet Aker,et al.  Bilingual dictionaries for all EU languages , 2014, LREC.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[9]  Yoshua Bengio,et al.  Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.

[10]  Paolo Rosso,et al.  UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification , 2019, *SEMEVAL.

[11]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[12]  Christian Biemann,et al.  NoSta-D Named Entity Annotation for German: Guidelines and Dataset , 2014, LREC.

[13]  Jian-Yun Nie,et al.  Positional Language Models for Clinical Information Retrieval , 2010, EMNLP.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Yutaka Matsuo,et al.  Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.

[16]  Francis Bond,et al.  A Survey of WordNets and their Licenses , 2011 .

[17]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[20]  Jan Svec,et al.  General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes , 2014, Lang. Resour. Evaluation.

[21]  William M. Campbell,et al.  Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[22]  Khaled Shaalan,et al.  A Survey of Semantic Analysis Approaches , 2020, AICV.

[23]  Horst Po¨ttker News and its communicative quality: the inverted pyramid—when and why did it appear? , 2003 .

[24]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[25]  Luke S. Zettlemoyer,et al.  Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[26]  Luyao Huang,et al.  Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence , 2019, NAACL.

[27]  Pushpak Bhattacharyya,et al.  Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis , 2019, NAACL.

[28]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[29]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[30]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Kristina Lerman,et al.  Who Falls for Online Political Manipulation? , 2018, WWW.

[32]  T. Jayasree,et al.  Detection of pathological voices using discrete wavelet transform and artificial neural networks , 2017, 2017 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS).

[33]  Tessa Daffern,et al.  Student and teacher perspectives on spelling , 2019, The Australian Journal of Language and Literacy.

[34]  Hasan Şakir Bilge,et al.  Deep Metric Learning: A Survey , 2019, Symmetry.

[35]  Surendra Shetty,et al.  Classification of Healthy and Pathological voices using MFCC and ANN , 2018, 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC).

[36]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[37]  V. Dellwo Rhythm and Speech Rate: A Variation Coefficient for deltaC , 2006 .

[38]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[39]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[40]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[41]  William F. Punch,et al.  Automated Concept Extraction From Plain Text , 1998 .

[42]  Arzucan Özgür,et al.  Improving Named Entity Recognition for Morphologically Rich Languages Using Word Embeddings , 2014, 2014 13th International Conference on Machine Learning and Applications.

[43]  Jesús Francisco Vargas-Bonilla,et al.  New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease , 2014, LREC.

[44]  Thomas Fang Zheng,et al.  Transfer learning for speech and language processing , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[45]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[46]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[47]  Qing Liu,et al.  Enhancing BERT Representation With Context-Aware Embedding for Aspect-Based Sentiment Analysis , 2020, IEEE Access.

[48]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[49]  Chunlei Zhang,et al.  End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances , 2017, INTERSPEECH.

[50]  Xiang Li,et al.  Tackling Sparsity, the Achilles Heel of Social Networks: Language Model Smoothing via Social Regularization , 2015, ACL.

[51]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[52]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[53]  Zhenzhou Lu,et al.  Variable importance analysis: A comprehensive review , 2015, Reliab. Eng. Syst. Saf..

[54]  Sebastian Stüker,et al.  Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.

[55]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[57]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[58]  Sandra M. Aluísio,et al.  Evaluating Phonetic Spellers for User-Generated Content in Brazilian Portuguese , 2016, PROPOR.

[59]  I. Mees,et al.  Practical Phonetics and Phonology: A Resource Book for Students , 2003 .

[60]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[61]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[62]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[63]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[64]  Yuxuan Wang,et al.  Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.

[65]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[66]  J. Saffran The Use of Predictive Dependencies in Language Learning , 2001 .

[67]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[68]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[69]  Magdalena Szumilas Explaining odds ratios. , 2010, Journal of the Canadian Academy of Child and Adolescent Psychiatry = Journal de l'Academie canadienne de psychiatrie de l'enfant et de l'adolescent.

[70]  Jiahai Wang,et al.  Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference , 2020, ArXiv.

[71]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[72]  Victoria J. Hodge,et al.  An Evaluation of Phonetic Spell Checkers , 2001 .

[73]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[74]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[75]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.

[76]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[77]  Zdenek Zabokrtský,et al.  Czech Named Entity Corpus and SVM-based Recognizer , 2009, NEWS@IJCNLP.

[78]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[79]  Christoph Meinel,et al.  Deep Learning for Medical Image Analysis , 2018, Journal of Pathology Informatics.

[80]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[81]  Hervé Bredin,et al.  TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[83]  Luigi Di Caro,et al.  Sentiment analysis via dependency parsing , 2013, Comput. Stand. Interfaces.

[84]  Jan Chorowski,et al.  On Multilingual Training of Neural Dependency Parsers , 2017, TSD.

[85]  Sebastian Stüker,et al.  Towards Context-Dependent Phonetic Spelling Error Correction in Children's Freely Composed Text for Diagnostic and Pedagogical Purposes , 2011, INTERSPEECH.

[86]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[87]  Cleber Zanchettin,et al.  Additive Margin SincNet for Speaker Recognition , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[88]  Alexandre Klementiev,et al.  Inducing Document Structure for Aspect-based Summarization , 2019, ACL.

[89]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[90]  Sercan Ömer Arik,et al.  Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.

[91]  Robert F. Tate,et al.  Correlation Between a Discrete and a Continuous Variable. Point-Biserial Correlation , 1954 .

[92]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[93]  R Sathyaraj,et al.  Named Entity Recognition by Using Maximum Entropy , 2015 .

[94]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  J. Bortz,et al.  Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler , 2006 .

[96]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[97]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[98]  Marián Simko,et al.  Combining Cross-lingual and Cross-task Supervision for Zero-Shot Learning , 2020, TDS.

[99]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[100]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[101]  Kepa Sarasola,et al.  Language Technology for Normalisation of Less-Resourced Languages , 2012, LREC 2012.

[102]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[103]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[104]  Musaed Alhussein,et al.  Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework , 2018, IEEE Access.

[105]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[106]  Peter Graff,et al.  The role of similarity in phonology , 2012 .

[107]  D. Abercrombie,et al.  Elements of General Phonetics , 1967 .

[108]  German Rigau,et al.  IXA pipeline: Efficient and Ready to Use Multilingual NLP tools , 2014, LREC.

[109]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[110]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[111]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[112]  Heiga Zen,et al.  Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.

[113]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[114]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[115]  Julie Carson-Berndsen,et al.  The Effect of Phoneme Distribution on Perceptual Similarity in English , 2019, INTERSPEECH.

[116]  Yu Liu,et al.  Rethinking Feature Discrimination and Polymerization for Large-scale Recognition , 2017, ArXiv.

[117]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[118]  Rakesh M. Verma,et al.  University of Houston @ CL-SciSumm 2017: Positional language Models, Structural Correspondence Learning and Textual Entailment , 2017, BIRNDL@SIGIR.

[119]  Brian Roark,et al.  Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[120]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[121]  Junko Ito,et al.  A prosodic theory of epenthesis , 1989 .

[122]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[123]  Paolo Rosso,et al.  UPV-UMA at CheckThat! Lab: Verifying Arabic Claims using a Cross Lingual Approach , 2019, CLEF.

[124]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[125]  Yoshua Bengio,et al.  Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.

[126]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[127]  Mirella Lapata,et al.  Single Document Summarization as Tree Induction , 2019, NAACL.

[128]  Svitlana Volkova,et al.  Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[129]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[130]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[131]  R. Port,et al.  Stop Epenthesis in English , 1986 .

[132]  Shiv Ram Dubey,et al.  A Performance Comparison of Loss Functions for Deep Face Recognition , 2018, ArXiv.

[133]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[134]  Brian A. Nosek,et al.  Liberals and conservatives rely on different sets of moral foundations. , 2009, Journal of personality and social psychology.

[135]  Dipti Misra Sharma,et al.  Construction Grammar Based Annotation Framework for Parsing Tamil , 2016, CICLing.

[136]  Lukás Burget,et al.  Analysis of Score Normalization in Multilingual Speaker Recognition , 2017, INTERSPEECH.

[137]  Josef Steinberger,et al.  Aspect-Level Sentiment Analysis in Czech , 2014, WASSA@ACL.

[138]  Nam H. Trinh,et al.  Pathological Speech Classification Using a Convolutional Neural Network , 2019 .

[139]  Zhen-Hua Ling,et al.  Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[140]  Preslav Nakov,et al.  Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media , 2019, NAACL.

[141]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[142]  Yoshua Bengio,et al.  Multi-Task Self-Supervised Learning for Robust Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[143]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[144]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[145]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[146]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[147]  Stefan Winkler,et al.  Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives , 2016, Multimedia Systems.

[148]  Jiri Matas,et al.  Systematic evaluation of convolution neural network advances on the Imagenet , 2017, Comput. Vis. Image Underst..

[149]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[150]  Lior Wolf,et al.  VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.

[151]  Ales Tamchyna,et al.  Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results , 2015, ITAT.

[152]  Isaac Cho,et al.  Can You Verifi This? Studying Uncertainty and Decision-Making About Misinformation Using Visual Analytics , 2018, ICWSM.

[153]  Abhishek Kumar,et al.  Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference , 2017, NIPS.

[154]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[155]  Jean-Luc Gauvain,et al.  Spoken Language Identification Using LSTM-Based Angular Proximity , 2017, INTERSPEECH.

[156]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[157]  Haoran Li,et al.  Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization , 2018, COLING.

[158]  Yijia Liu,et al.  Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding , 2018, COLING.

[159]  Jan Svec,et al.  Adjusting BERT's Pooling Layer for Large-Scale Multi-Label Text Classification , 2020, TDS.

[160]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[161]  K. I. Ramachandran,et al.  Feature selection using Decision Tree and classification through Proximal Support Vector Machine for fault diagnostics of roller bearing , 2007 .

[162]  Wenqing Sun,et al.  Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data , 2017, Comput. Medical Imaging Graph..

[163]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[164]  Yaozong Gao,et al.  ASDNet: Attention Based Semi-supervised Deep Networks for Medical Image Segmentation , 2018, MICCAI.

[165]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[166]  Sanghun Kim,et al.  An approach on a combination of higher-order statistics and higher-order differential energy operator for detecting pathological voice with machine learning , 2018, 2018 International Conference on Information and Communication Technology Convergence (ICTC).

[167]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[168]  Stefan Müller,et al.  Grammatical theory: From transformational grammar to constraint-based approaches , 2016 .

[169]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[170]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[171]  Christopher M. Danforth,et al.  Sifting robotic from organic text: A natural language approach for detecting automation on Twitter , 2015, J. Comput. Sci..

[172]  Jun'ichi Tsujii,et al.  An Intelligent Search Engine and GUI-based Efficient MEDLINE Search Tool Based on Deep Syntactic Parsing , 2006, ACL.

[173]  Jiasong Sun,et al.  Angular Softmax Loss for End-to-end Speaker Verification , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[174]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[175]  Ghulam Muhammad,et al.  Development of the Arabic Voice Pathology Database and Its Evaluation by Using Speech Features and Machine Learning Algorithms , 2017, Journal of healthcare engineering.

[176]  Naoaki Okazaki,et al.  Positional Encoding to Control Output Sequence Length , 2019, NAACL.

[177]  Sang-Goo Lee,et al.  Utterance Generation With Variational Auto-Encoder for Slot Filling in Spoken Language Understanding , 2019, IEEE Signal Processing Letters.

[178]  Ajinkya Kulkarni,et al.  Layer adaptation for transfer of expressivity in speech synthesis , 2019 .

[179]  Janyce Wiebe,et al.  +/-EffectWordNet: Sense-level Lexicon Acquisition for Opinion Inference , 2014, EMNLP.

[180]  Marta Esther Vicente,et al.  Relevant Content Selection through Positional Language Models: An Exploratory Analysis , 2020, Proces. del Leng. Natural.

[181]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[182]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[183]  Ladislav Lenc,et al.  Neural Networks for Sentiment Analysis in Czech , 2016, ITAT.

[184]  Yllias Chali,et al.  Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships , 2016, ECIR.

[185]  Paolo Rosso,et al.  Leveraging Emotional Signals for Credibility Detection , 2019, SIGIR.

[186]  Michal Novák,et al.  Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach , 2019, Comput. Speech Lang..

[187]  Ronald W. Langacker,et al.  Cognitive Grammar: A Basic Introduction , 2008 .

[188]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[189]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[190]  Alexey Sorokin,et al.  Tuning Multilingual Transformers for Language-Specific Named Entity Recognition , 2019, BSNLP@ACL.

[191]  Tomas Brychcin,et al.  Unsupervised Improving of Sentiment Analysis Using Global Target Context , 2013, RANLP.

[192]  Richard Khoury Microtext normalization using probably-phonetically-similar word discovery , 2015, 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob).

[193]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[194]  Luca Mazzola,et al.  Concept Extraction with Convolutional Neural Networks , 2018, DATA.

[195]  Morten H. Christiansen,et al.  How hierarchical is language use? , 2012, Proceedings of the Royal Society B: Biological Sciences.

[196]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[197]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[198]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[199]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[200]  Min Zhang,et al.  Building Powerful Dependency Parsers for Resource-Poor Languages , 2016, NLPCC/ICCPOL.

[201]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[202]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[203]  Sang-goo Lee,et al.  Data Augmentation for Spoken Language Understanding via Joint Variational Generation , 2018, AAAI.

[204]  William C. Mann,et al.  Rhetorical Structure Theory: Description and Construction of Text Structures , 1987 .

[205]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[206]  Josef Steinberger,et al.  Sentiment Analysis in Czech Social Media Using Supervised Machine Learning , 2013, WASSA@NAACL-HLT.

[207]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[208]  Pavel Korshunov,et al.  Pyannote.Audio: Neural Building Blocks for Speaker Diarization , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[209]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[210]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[211]  Preslav Nakov,et al.  Cross-language Learning with Adversarial Neural Networks , 2017, CoNLL.

[212]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[213]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[214]  Luuk J. Spreeuwers,et al.  FEERCI: A Package for Fast Non-Parametric Confidence Intervals for Equal Error Rates in Amortized O(m log n) , 2018, 2018 International Conference of the Biometrics Special Interest Group (BIOSIG).

[215]  Nassir Navab,et al.  Semi-supervised Deep Learning for Fully Convolutional Networks , 2017, MICCAI.

[216]  Taesu Kim,et al.  Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[217]  Nancy F. Chen,et al.  Exploiting Discourse-Level Segmentation for Extractive Summarization , 2019, EMNLP.

[218]  Yuxuan Wang,et al.  Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.

[219]  Oliver Watts,et al.  TUNDRA: a multilingual corpus of found data for TTS research created with light supervision , 2013, INTERSPEECH.

[220]  Julie Carson-Berndsen,et al.  Enhancing Data-Driven Phone Confusions Using Restricted Recognition , 2016, INTERSPEECH.

[221]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[222]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[223]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[224]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[225]  Michael Picheny,et al.  Acoustic Markov models used in the Tangora speech recognition system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[226]  Lin Zhao,et al.  Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees , 2014, EMNLP.

[227]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[228]  Katja Hofmann,et al.  A Comparative Study of Features for Keyphrase Extraction in Scientific Literature , 2009 .

[229]  Bowen Zhou,et al.  Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling , 2016, INTERSPEECH.

[230]  Jacobo Rouces,et al.  Aspect-Based Sentiment Analysis using BERT , 2019, NODALIDA.

[231]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[232]  Ole Winther,et al.  How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.

[233]  Samy Bengio,et al.  Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.

[234]  Miikka Silfverberg,et al.  Data-Driven Spelling Correction using Weighted Finite-State Methods , 2016, ACL 2016.

[235]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[236]  Josef Steinberger,et al.  Unsupervised Methods to Improve Aspect-Based Sentiment Analysis in Czech , 2016, Computación y Sistemas.

[237]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[238]  Mitchell McLaren,et al.  How to train your speaker embeddings extractor , 2018, Odyssey.

[239]  Paolo Rosso,et al.  An Emotional Analysis of False Information in Social Media and News Articles , 2019, ACM Trans. Internet Techn..

[240]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[241]  M. Shamsi,et al.  The contribution of prosody to the identification of Persian regional accents , 2012, 2012 IEEE Symposium on Industrial Electronics and Applications.

[242]  Ghania Droua-Hamdani,et al.  Rhythm metrics in MSA spoken language of six algerian regions , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[243]  Slim Ouni,et al.  Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis , 2019, INTERSPEECH.

[244]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[245]  Pier Marco Bertinetto,et al.  On modeling the rhythm of natural languages , 2008 .

[246]  Max A. Little,et al.  A Parametric Approach for Classification of Distortions in Pathological Voices , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[247]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[248]  Ghania Droua-Hamdani,et al.  Classification of Regional Accent Using Speech Rhythm Metrics , 2019, SPECOM.

[249]  Nam H. Trinh,et al.  Semi-Supervised Learning with Generative Adversarial Networks for Pathological Speech Classification , 2020, 2020 31st Irish Signals and Systems Conference (ISSC).

[250]  Augustus Odena,et al.  Semi-Supervised Learning with Generative Adversarial Networks , 2016, ArXiv.

[251]  Jean Véronis,et al.  Computerized correction of phonographic errors , 1988, Comput. Humanit..

[252]  Eduardo Lleida,et al.  Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification , 2019, Comput. Speech Lang..

[253]  Sid-Ahmed Selouani,et al.  Algerian Arabic rhythm classification , 2010, ExLing.

[254]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[255]  C. Read,et al.  Children's Creative Spelling , 1986 .

[256]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[257]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[258]  Filippo Menczer,et al.  The spread of fake news by social bots , 2017, ArXiv.

[259]  A. C. Gimson,et al.  An introduction to the pronunciation of English , 1991 .

[260]  David Yarowsky,et al.  Cross-lingual Dependency Parsing Based on Distributed Representations , 2015, ACL.

[261]  Tong Lin,et al.  MarginGAN: Adversarial Training in Semi-Supervised Learning , 2019, NeurIPS.

[262]  Quan Wang,et al.  Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[263]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[264]  Björn Schuller,et al.  Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio , 2017, DCASE.

[265]  Xie Yanlu,et al.  Automatic detection of rhythmic patterns in native and L2 speech: Chinese, Japanese, and Japanese L2 Chinese , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[266]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[267]  J. Trueswell,et al.  The role of discourse context in the processing of a flexible word-order language , 2004, Cognition.

[268]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[269]  Joaquín Romero,et al.  The improvement of Spanish/Catalan EFL students' prosody by means of explicit rhythm instruction , 2018, International Symposium on Applied Phonetics (ISAPh 2018).

[270]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[271]  Xudong Lin,et al.  Deep Variational Metric Learning , 2018, ECCV.

[272]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[273]  William M. Fisher A statistical text-to-phone function using ngrams and rules , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[274]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[275]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[276]  Doaa Mohey El Din Mohamed Hussein,et al.  A survey on sentiment analysis challenges , 2016, Journal of King Saud University - Engineering Sciences.

[277]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[278]  German Rigau,et al.  Robust multilingual Named Entity Recognition with shallow semi-supervised features , 2016, Artif. Intell..

[279]  Sebastian Stabinger,et al.  Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification , 2020, LREC.

[280]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[281]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[282]  Marta Esther Vicente,et al.  Statistical language modelling for automatic story generation , 2018, J. Intell. Fuzzy Syst..

[283]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[284]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[285]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[286]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[287]  Yuxiang Wu,et al.  Learning to Extract Coherent Summary via Deep Reinforcement Learning , 2018, AAAI.

[288]  Muhammad Ghulam,et al.  Voice pathology detection using interlaced derivative pattern on glottal source excitation , 2017, Biomed. Signal Process. Control..

[289]  Moira Yip,et al.  English vowel epenthesis , 1987 .

[290]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.