Normalization and parsing algorithms for uncertain input

The automatic analysis (parsing) of natural language is an important ingredient for many natural language processing applications (search-engines, automatic translation, speech-processing, etc.), as it is the first step towards interpretation. For standard texts, like well-edited news articles, current parsers perform very well. However, for user-generated content, such as tweets, parser performance drops dramatically. In this research, we attempt to improve the automatic analysis of spontaneous language by translating it to 'normal' language. For example, the sentence "new pix comming tomorroe" is translated to "new pictures coming tomorrow". In this example sentence, a variety of phenomena occurs: 'pix' is a replacement based on the pronunciation, whereas 'comming' is probably a typo. This translation is also referred to as 'normalization'. Based on the observation that the normalization problem actually consists of multiple sub-problems, we developed a modular normalization model: MoNoise. This normalization model reaches a new state-of-art performance on a variety of languages. Normalizing social media texts leads to a performance increase for syntactic parsers. In the basic setup, we use only the single best normalization candidate for each word, which might lead to error propagation. Hence, we introduce two novel methods to let the parser to take multiple normalization candidates into account per position, leading to further improvements in parser performance.

[1]  Yi Yang,et al.  A Log-Linear Model for Unsupervised Text Normalization , 2013, EMNLP.

[2]  Sible Andringa,et al.  Form-focused instruction and the development of second language proficiency , 2005 .

[3]  Sake Jager,et al.  Towards ICT-integrated language learning: Developing an implementation framework in terms of Pedagogy, Technology and Environment , 2009 .

[4]  Holger Christian Hopp,et al.  Ultimate attainment at the interfaces in second language acquisition : grammar and processing , 2007 .

[5]  Lyle H. Ungar,et al.  Diachronic degradation of language models: Insights from social media , 2018, ACL.

[6]  F. Köder Between direct and indirect speech : The acquisition of pronouns in reported speech , 2016 .

[7]  L. Donne Convincing through conversation: Unraveling the role of interpersonal health communication in health campaign effectiveness , 2018 .

[8]  Rienk Withaar,et al.  The Role of the Phonological Loop in Sentence Comprehension , 2002 .

[9]  Vivek Kumar Rangarajan Sridhar Unsupervised Text Normalization Using Distributed Representations of Words and Phrases , 2015, VS@HLT-NAACL.

[10]  María Begoña Villada Moirón,et al.  University of Groningen Data-driven identification of fixed expressions and their modifiability , 2005 .

[11]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[12]  Eric Hoekstra,et al.  Licensing conditions on phrase structure , 1991 .

[13]  Lilia Lubomirova Schürcks-Grozeva,et al.  Binding and Bulgarian , 2003 .

[14]  白石 英才,et al.  Topics in nivkh phonology , 2006 .

[15]  Teodora Hristova Mehotcheva,et al.  After the fiesta is over: foreign language attrition of Spanish in Dutch and German Erasmus students , 2010 .

[16]  Yan Huang,et al.  Anchoring and Agreement in Syntactic Annotations , 2016, EMNLP.

[17]  Margrietha Esther Ruigendijk Case assignment in Agrammatism: a cross-linguistic study , 2001 .

[18]  S. Popov Auditory and visual ERP correlates of gender agreement processing in Dutch and Italian , 2017 .

[19]  J. Overweg Taking an alternative perspective on language in autism , 2018 .

[20]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[21]  Hanneke Loerts,et al.  Uncommon gender : eyes and brains, native and second language learners, & grammatical gender , 2008 .

[22]  Matt Post,et al.  Error-repair Dependency Parsing for Ungrammatical Texts , 2017, ACL.

[23]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[24]  Josef van Genabith,et al.  From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0 , 2011, IJCNLP.

[25]  Arkaitz Zubiaga,et al.  Introducción a la Tarea Compartida Tweet-Norm 2013: Normalización Léxica de Tuits en Español , 2013, Tweet-Norm@SEPLN.

[26]  Robert George Shackleton,et al.  Quantitative assessment of English-American speech relationships , 2010 .

[27]  Fridah Katushemererwe,et al.  Computational morphology and Bantu language learning: an implementation for Runyakitara , 2013 .

[28]  Laura Sabourin Grammatical Gender and Second Language Processing , 2003 .

[29]  Gertjan van Noord,et al.  ROB: Using Semantic Meaning to Recognize Paraphrases , 2015, SemEval@NAACL-HLT.

[30]  S. Keulen Foreign Accent Syndrome: A Neurolinguistic Analysis , 2017 .

[31]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[32]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[33]  R. Gillmann Automatic Verification of Hypothesized Phonemic Strings in Continuous Speech , 1974 .

[34]  Jennifer Foster "cba to check the spelling": Investigating Parser Performance on Discussion Forum Posts , 2010, HLT-NAACL.

[35]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[36]  Steven Bethard,et al.  A Synchronous Context Free Grammar for Time Normalization , 2013, EMNLP.

[37]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[38]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[39]  José-Luis Sancho-Gómez,et al.  Word Normalization in Twitter Using Finite-state Transducers , 2013, Tweet-Norm@SEPLN.

[40]  Grzegorz Chrupala,et al.  Normalizing tweets with edit scripts and recurrent neural embeddings , 2014, ACL.

[41]  Stasinos Konstantopoulos Using ILP to learn local linguistic structures , 2003 .

[42]  L. Bos The brain, verbs, and the past : Neurolinguistic studies on time reference` , 2015 .

[43]  Barbara Plank,et al.  What to do about non-standard (or non-canonical) language in NLP , 2016, KONVENS.

[44]  Tyler Baldwin,et al.  Adaptive Parser-Centric Text Normalization , 2013, ACL.

[45]  Rob Koeling,et al.  Dialogue-based disambiguation: using dialogue status to improve speech understanding , 2002 .

[46]  David C. Gibbon,et al.  Introduction to video search engines , 2008 .

[47]  Gertjan van Noord,et al.  A Taxonomy for In-depth Evaluation of Normalization for User Generated Content , 2018, LREC.

[48]  Veerle M. Baaijen,et al.  The development of understanding through writing , 2012 .

[49]  Harwintha Yuhria Anjarningsih,et al.  Time reference in standard Indonesian agrammatic aphasia , 2012 .

[50]  Jörg Tiedemann,et al.  An SMT Approach to Automatic Annotation of Historical Text , 2013 .

[51]  Tanja Gaustad,et al.  Linguistic knowledge and word sense disambiguation , 2004 .

[52]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[53]  Malvina Nissim,et al.  Sharing Is Caring: The Future of Shared Tasks , 2017, Computational Linguistics.

[54]  Gertjan van Noord,et al.  Modeling Input Uncertainty in Neural Network Dependency Parsing , 2018, EMNLP.

[55]  Jennifer Foster,et al.  GenERRate: Generating Errors for Use in Grammatical Error Detection , 2009, BEA@NAACL.

[56]  A. Schüppert Origin of asymmetry. Mutual intelligibility of spoken Danish and Swedish , 2011 .

[57]  Petra Hendriks,et al.  Comparatives and Categorial Grammar , 1995 .

[58]  J. Nerbonne,et al.  University of Groningen An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects Leinonen , 2010 .

[59]  F. Manni Linguistic probes into human history , 2017 .

[60]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[61]  Bernard Lang A Generative View of Ill-Formed Input Processing , 1989 .

[62]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[63]  Kirsten Lundgaard Kolstrup,et al.  Opportunities to speak: A qualitative study of a second language in use , 2015 .

[64]  Dipti Misra Sharma,et al.  Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text , 2016, NAACL.

[65]  K. Yoshioka,et al.  Linguistic and gestural introduction and tracking of referents in L1 and L2 discourse , 2005 .

[66]  Tomaž Erjavec,et al.  Normalising Slovene data: historical texts vs. user-generated content , 2016, KONVENS.

[67]  Tamás Biró,et al.  Finding the right words: implementing optimality theory with simulated annealing , 2006 .

[68]  Mark Kas,et al.  Essays on Boolean Functions and Negative Polarity , 1993 .

[69]  W. Jansen Laryngeal contrast and phonetic voicing : a laboratory phonology approach to English, Hungarian, and Dutch , 2004 .

[70]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[71]  Marjolein Deunk,et al.  Discourse practices in preschool: young children's participation in everyday classroom activities , 2009 .

[72]  A. Walsweer Ruimte voor leren: Een etnografisch onderzoek naar het verloop van een interventie gericht op versterking van het taalgebruik in een knowledge building environment op kleine Friese basisscholen , 2015 .

[73]  Seçkin Arslan Neurolinguistic & psycholinguistic investigations on evidentiality in Turkish , 2016 .

[74]  Rob van der Goot,et al.  Normalizing Social Media Texts by Combining Word Embeddings and Edit Distances in a Random Forest Regressor , 2016 .

[75]  Timothy Baldwin,et al.  Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition , 2015, NUT@IJCNLP.

[76]  Y.R.M. Bastiaanse,et al.  Open your eyes and listen carefully. Auditory and audiovisual speech perception and the McGurk effect in Dutch speakers with and without aphasia , 2003 .

[77]  R. Jonkers,et al.  Comprehension and production of verbs in aphasic speakers , 1998 .

[78]  Noortje Venhuizen,et al.  Projection in discourse A data-driven formal semantic analysis , 2015 .

[79]  Gerlof Bouma,et al.  Starting a sentence in Dutch : a corpus study of subject- and object-fronting , 2008 .

[80]  Radek Šimík,et al.  Modal existential wh-constructions , 2011 .

[81]  Joachim Daiber,et al.  The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions , 2016, LREC.

[82]  J. D. Jong,et al.  Specific language impairment in Dutch , 1999 .

[83]  Xiaoyan Xu,et al.  English language attrition and retention in Chinese and Dutch university students , 2010 .

[84]  Erik Fajoen Tjong-Kim-Sang Machine Learning of Phonotactics , 1998 .

[85]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[86]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[87]  Francisco Dellatorre Borges,et al.  Parse selection with Support Vector Machines , 2010 .

[88]  M. G. D. Meij The impact of degree of bilingualism on L3 development: English language development in early and later bilinguals in the Frisian context , 2018 .

[89]  Yang Liu,et al.  Improving Text Normalization using Character-Blocks Based Models and System Combination , 2012, COLING.

[90]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[91]  Marjolijn Verspoor,et al.  Frequency and function in WH question acquisition. A usage-based case study of German L1 acquisition , 2005 .

[92]  Jacqueline F. van Kruiningen,et al.  Onderwijsontwerp als conversatie: probleemoplossing in interprofessioneel overleg , 2010 .

[93]  Johannes Bjerva,et al.  One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis , 2017, ArXiv.

[94]  Thi Hong Nguyen A dynamic usage-based approach to second language teaching , 2013 .

[95]  Ankelien Schippers,et al.  Variation and change in Germanic long-distance dependencies , 2012 .

[96]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[97]  Dieuwke de Goede,et al.  Verbs in spoken sentence processing : Unraveling the activation pattern of the matrix verb pattern of the matrix verb , 2006 .

[98]  Aleyda Linares Calix Raising metacognitive genre awareness in L2 academic readers and writers , 2015 .

[99]  Kostadin Cholakov,et al.  Lexical acquisition for computational grammars , 2012 .

[100]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[101]  Malvina Nissim,et al.  To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging , 2017, NUT@EMNLP.

[102]  H. Melville,et al.  The interactional accomplishment of action , 2018 .

[103]  Monika Zempleni,et al.  Functional imaging of the hemispheric contribution to language processing , 2006 .

[104]  Giorgio Satta,et al.  Dynamic Programming Algorithms for Transition-Based Dependency Parsers , 2011, ACL.

[105]  Kim Sauter,et al.  Transfer and access to universal grammar in adult second language acquisition , 2002 .

[106]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[107]  Y.R.M. Bastiaanse,et al.  Audiovisual processing in aphasic and non-brain-damaged listeners. The whole is more than the sum of its parts , 2007 .

[108]  F. Irshad Second language development through the lens of a dynamic usage-based approach , 2015 .

[109]  Richard Sproat,et al.  Minimally Supervised Number Normalization , 2016, TACL.

[110]  Roger Levy,et al.  A Noisy-Channel Model of Human Sentence Comprehension under Uncertain Input , 2008, EMNLP.

[111]  S. Kuijper,et al.  Communication abilities of children with ASD and ADHD , 2016 .

[112]  E. H. Klein-van der Laaken,et al.  Adverbs of Degree in Dutch , 1997 .

[113]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[114]  Edith Kaan,et al.  Processing subject-object ambiguities in Dutch , 1997 .

[115]  Sun Predictors and stages of very young child EFL learners ’ English development in China , 2015 .

[116]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[117]  Ning Jin NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization , 2015, NUT@IJCNLP.

[118]  H. D. Swart,et al.  Adverbs of quantification : a generalized quantifier approach , 1993 .

[119]  Jori Mur,et al.  Off-line answer extraction for question answering , 2008 .

[120]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[121]  Peter Ignacz Blok,et al.  The Interpretation of Focus , 1993 .

[122]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[123]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[124]  Y.R.M. Bastiaanse,et al.  Verb and word order deficits in Swahili-English bilingual agrammatic speakers , 2012 .

[125]  Maria Trofimova,et al.  Case assignment by prepositions in Russian aphasia , 2009 .

[126]  Janneke ter Beek,et al.  Restructuring and infinitival complements in Dutch , 2008 .

[127]  N. V. D. Schaaf Kijk eens wat ik kan!: Sociale praktijken in de interactie tussen kinderen van 4 tot 8 jaar in de buitenschoolse opvang , 2016 .

[128]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[129]  Shalom Zuckerman,et al.  The acquisition of "optional" movement , 2001 .

[130]  Jelena Golubović Mutual intelligibility in the Slavic language area , 2016 .

[131]  Simon Suster,et al.  Empirical studies on word representations , 2016 .

[132]  Joop Houtman,et al.  Coordination and constituency : a study in categorial grammar , 1994 .

[133]  Diana V. Dimitrova,et al.  Neural correlates of prosody and information structure , 2012 .

[134]  Brendan T. O'Connor,et al.  Twitter Universal Dependency Parsing for African-American and Mainstream American English , 2018, ACL.

[135]  Jelena Prokic,et al.  Families and resemblances , 2010 .

[136]  Dicky Gilbers,et al.  Phonological Networks: A Theory of Segment Representation , 1992 .

[137]  Erik-Jan Smits,et al.  Acquiring quantification. How children use semantics and pragmatics to constrain meaning , 2004 .

[138]  Dong-Hong Ji,et al.  Twitter Normalization via 1-to-N Recovering , 2016, WISE.

[139]  L. M. Bosveld-de Smet,et al.  On mass and plural quantification: the case of French des/du NPs , 2001 .

[140]  T. Van de Cruys,et al.  Mining for meaning: the extraction of lexico-semantic knowledge from text , 2010 .

[141]  Harm Brouwer,et al.  The electrophysiology of language comprehension A neurocomputational model , 2010 .

[142]  Martijn Wieling,et al.  A quantitative approach to social and geographical dialect variation , 2012 .

[143]  Sana Haidry Assessment of Dyslexia in the Urdu Language , 2017 .

[144]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[145]  Jan-Wouter Zwart,et al.  Dutch syntax: A minimalist approach , 1993 .

[146]  J. Hurkmans,et al.  The treatment of apraxia of speech : speech and music therapy, an innovative joint effort , 2016 .

[147]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[148]  Arul Menezes,et al.  Social Text Normalization using Contextual Graph Random Walks , 2013, ACL.

[149]  Valerie Ruth Mariana The Multidimensional Quality Metric (MQM) Framework: A New Framework for Translation Quality Assessment , 2014 .

[150]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[151]  Margreet Vogelzang Reference and cognition: Experimental and computational cognitive modeling studies on reference processing in Dutch and Italian , 2017 .

[152]  A. Giannakidou The Landscape of Polarity Items , 1997 .

[153]  Yang Liu,et al.  Improving Text Normalization via Unsupervised Model and Discriminative Reranking , 2014, ACL.

[154]  M. S. Safavi The competition of memory and expectation in resolving long-distance dependencies : Psycholinguistic evidence from Persian complex predicates , 2017 .

[155]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[156]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[157]  Güliz Günes,et al.  Deriving Prosodic Structures , 2015 .

[158]  Gertjan van Noord,et al.  Parser Adaptation for Social Media by Integrating Normalization , 2017, ACL.

[159]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[160]  Christoph Bergmann Facets of native-likeness: First-language attrition among German emigrants to Anglophone North America , 2017 .

[161]  I. Lučića {bs,hr,sr}WaC – Web corpora of Bosnian, Croatian and Serbian , 2014 .

[162]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[163]  Dirk Barend den Ouden,et al.  Phonology in aphasia: syllables and segments in level-specific deficits , 2002 .

[164]  Peter Nabende,et al.  Applying dynamic Bayesian networks in transliteration detection and generation , 2011 .

[165]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[166]  Yoav Goldberg,et al.  Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser , 2011, ACL.

[167]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[168]  P. D. Rijkhoek,et al.  On degree phrases and result clauses , 1998 .

[169]  Myrte N. Gosen,et al.  Tracing learning in interaction: an analysis of shared reading of picture books at kindergarten , 2012 .

[170]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[171]  Kostadin Cholakov,et al.  Lexical acquisition for computational grammars. A unified model , 2009 .

[172]  Neslihan Kansu-Yetkiner,et al.  Blood, shame and fear: Self-presentation strategies in Turkish women¿s talk about their health and sexuality , 2002 .

[173]  Aysa Arylova,et al.  Possession in the Russian clause. Towards dynamicity in syntax , 2007 .

[174]  Rui Qin Neurophysiological studies of reading fluency: Towards visual and auditory markers of developmental dyslexia , 2016 .

[175]  Gertjan van Noord,et al.  MoNoise: Modeling Noise Using a Modular Normalization System , 2017, ArXiv.

[176]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[177]  Martin Reynaert,et al.  All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation , 2008, LREC.

[178]  Francisca Swarte Predicting the mutual intelligibility of Germanic languages from linguistic and extra-linguistic factors , 2016 .

[179]  Joanneke Prenger,et al.  Taal telt!: Een onderzoek naar de rol van taalvaardigheid en tekstbegrip in het realistisch wiskundeonderwijs , 2005 .

[180]  Gülsen Yilmaz,et al.  Bilingual language development among the first generation Turkish immigrants in the Netherlands , 2013 .

[181]  Joachim Bingel,et al.  Learning attention for historical text normalization by learning to pronounce , 2017, ACL.

[182]  Rimke Groenewold,et al.  Direct and indirect speech in aphasia : studies of spoken discourse production and comprehension , 2015 .

[183]  Muriel Norde,et al.  Expressions of epistemic modality in Mainland Scandinavian. A study into the lexicalization-grammaticalization-pragmaticalization interface , 2007 .

[184]  Helen Dehoop,et al.  Case Configuration and Noun Phrase Interpretation , 1996 .

[185]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[186]  Walter Daelemans,et al.  Multimodular Text Normalization of Dutch User-Generated Content , 2016, ACM Trans. Intell. Syst. Technol..

[187]  Yinxing Jin Foreign language classroom anxiety: A study of Chinese university students of Japanese and English over time , 2016 .

[188]  Marie Louise Elizabeth van der Plas,et al.  Automatic lexico-semantic acquisition for question answering , 2008 .

[189]  Mettina Jolanda Arnoldina Veenstra,et al.  Formalizing the minimalist program , 1998 .

[190]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[191]  Nienke Houtzager,et al.  Bilingual advantages in middle-aged and elderly populations , 2015 .

[192]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[193]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[194]  S. Schoof,et al.  An HPSG account of nonfinite verbal complements in Latin , 2004 .

[195]  Asif Ekbal,et al.  IITP: Hybrid Approach for Text Normalization in Twitter , 2015, NUT@IJCNLP.

[196]  Elena Lloret,et al.  Towards Facilitating the Accessibility of Web 2 . 0 Texts through Text Normalisation , 2012 .

[197]  Gijsbert Bos,et al.  Rapid user interface development with the script language Gist , 1993 .

[198]  Ken Decker Orthography Development for Creole Languages , 2014 .

[199]  Milada Walková,et al.  The aspectual function of particles in phrasal verbs , 2013 .

[200]  Miren Arantzeta Perez Sentence comprehension in monolingual and bilingual aphasia: Evidence from behavioral and eye-tracking methods , 2017 .

[201]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[202]  Tomaž Erjavec,et al.  Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.0 , 2017 .

[203]  Valerio Basile,et al.  From Logic to Language : Natural Language Generation from Logical Forms. (de la logique à la langue) , 2015 .

[204]  Dat Quoc Nguyen,et al.  A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing , 2017, CoNLL.

[205]  Tal Caspi,et al.  A dynamic perspective on second language development , 2010 .

[206]  Rita Landeweerd,et al.  Discourse semantics of perspective and temporal structure , 1998 .

[207]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[208]  R. Bastiaanse,et al.  Clitic production in Italian agrammatism , 2005, Brain and Language.

[209]  J. T. de Jong,et al.  The case of bound pronouns in peripheral Romance , 1996 .

[210]  Chris Dyer,et al.  Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[211]  Kashmiri Stec Visible Quotation: The multimodal expression of viewpoint , 2012 .

[212]  Ismail Fahmi,et al.  Automatic term and relation extraction for medical question answering system , 2009 .

[213]  K. Colman Behavioral and neuroimaging studies on language processing in Dutch speakers with Parkinson's disease , 2011 .

[214]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[215]  Tyler Baldwin,et al.  An In-depth Analysis of the Effect of Text Normalization in Social Media , 2015, HLT-NAACL.

[216]  Joseph Le Roux,et al.  Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields , 2013, TSLP.

[217]  A.P.J. van den Bosch,et al.  Dealing with big data: The case of Twitter , 2013, CLIN 2013.

[218]  Belinda Chan,et al.  A dynamic approach to the development of lexicon and syntax in a second language , 2015 .

[219]  M. Bates,et al.  The use of syntax in a speech understanding system , 1975 .

[220]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[221]  Natalia Korchagina Normalizing Medieval German Texts: from rules to deep learning , 2017, ListLang@NoDaLiDa.

[222]  T. Glatz Serious games as a level playing field for early literacy: A behavioural and neurophysiological evaluation , 2018 .

[223]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[224]  Hana Smiskova-Gustafsson,et al.  Chunks in L2 development: a usage-based perspective , 2013 .

[225]  Sandra Kübler,et al.  Domain Adaptation for Parsing , 2013, RANLP.

[226]  Véronique Hoste,et al.  Towards Shared Datasets for Normalization Research , 2014, LREC.

[227]  Fei Liu,et al.  A Broad-Coverage Normalization System for Social Media Language , 2012, ACL.

[228]  Malvina Nissim,et al.  Bleaching Text: Abstract Features for Cross-lingual Gender Prediction , 2018, ACL.

[229]  G. N. Bienfait,et al.  Grammatica-onderwijs aan allochtone jongeren , 2002 .

[230]  Chin-Hui Lee,et al.  Tweet Normalization with Syllables , 2015, ACL.

[231]  Courtney Leigh Cannizzaro,et al.  Early word order and animacy , 2012 .

[232]  Pavel Rudnev,et al.  Dependency and discourse-configurationality A study of Avar , 2015 .

[233]  J. Yue Tone-word recognition in Mandarin Chinese: Influences of lexical-level representations , 2016 .

[234]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[235]  Marcel Bollmann,et al.  (Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool , 2012 .

[236]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[237]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[238]  Yang Liu,et al.  Normalization of informal text , 2014, Comput. Speech Lang..

[239]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[240]  Joakim Nivre,et al.  Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle , 2017, IWPT.

[241]  Leonoor Johanneke van der Beek,et al.  Topics in corpus-based Dutch syntax , 2005 .

[242]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[243]  T. Benjamin Signaling trouble: on the linguistic design of other-initiation of repair in English conversation , 2012 .

[244]  Gideon Kotzé,et al.  Complementary approaches to tree alignment. Combining statistical and rule-based methods , 2008 .

[245]  A. Mullen,et al.  An investigation into compositional features and feature merging for maximum entropy-based parse selection , 2002 .

[246]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[247]  Jacolien van Rij,et al.  Pronoun Processing. Computational, behavioral, and psychophysiological studies in children and adults , 2012 .

[248]  M. Haan Mode matters: Effects of survey modes on participation and answering behavior , 2015 .

[249]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[250]  Joakim Nivre,et al.  From Raw Text to Universal Dependencies - Look, No Tags! , 2017, CoNLL.

[251]  Ryan C. Taylor Tracking referents : markedness, world knowledge and pronoun resolution , 2013 .

[252]  Oscar Strik,et al.  Modelling analogical change A History of Swedish and Frisian Verb Inflection , 2015 .

[253]  Maartje Schreuder,et al.  Prosodic processes in language and music , 2006 .

[254]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[255]  Robbert Prins,et al.  Finite-state pre-processing for natural language analysis , 2005 .

[256]  Judith Rispens,et al.  Syntactic and phonological processing in developmental dyslexia , 2004 .

[257]  Zheng Yuan,et al.  Generating artificial errors for grammatical error correction , 2014, EACL.

[258]  Geoffrey Andogah,et al.  Geographically constrained information retrieval , 2011 .

[259]  E. V. Setten Neurolinguistic profiles of advanced readers with developmental dyslexia , 2019 .

[260]  Cornelia Lahmann,et al.  Beyond barriers: Complexity, accuracy, and fluency in long-term L2 speakers' speech , 2015 .

[261]  Çagri Çöltekin,et al.  Catching words in a stream of speach : computational simulations of segmenting transcribed child-directed speech , 2011 .

[262]  Daniel Zeman,et al.  Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL Shared Task.

[263]  Paula Buttery,et al.  A Text Normalisation System for Non-Standard English Words , 2017, NUT@EMNLP.

[264]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[265]  Kilian Evang,et al.  Cross-lingual semantic parsing with categorial grammars , 2017 .

[266]  Siti Mina Tamah,et al.  Student interaction in the implementation of the jigsaw technique in language teaching , 2011 .

[267]  Joseph Le Roux,et al.  Foreebank: Syntactic Analysis of Customer Support Forums , 2015, EMNLP.

[268]  Barbara Plank,et al.  Reversible Stochastic Attribute-Value Grammars , 2011, ACL.

[269]  Yang Liu,et al.  Joint POS Tagging and Text Normalization for Informal Text , 2015, IJCAI.

[270]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[271]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[272]  Tuba Yarbay Duman,et al.  Turkish agrammatic aphasia : word order, time reference and case , 2009 .

[273]  Arianus Pieter Versloot,et al.  Mechanisms of Language Change: Vowel Reduction in 15th Century West Frisian , 2008 .

[274]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[275]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.