A Neurophysiologically-Inspired Statistical Language Model

We describe a statistical language model having components that are inspired by electrophysiological activities in the brain. These components correspond to important language-relevant event-related potentials measured using electroencephalography. We relate neural signals involved in localand long-distance grammatical processing, as well as localand long-distance lexical processing to statistical language models that are scalable, cross-linguistic, and incremental. We develop a novel language model component that unifies n-gram, skip, and trigger language models into a generalized model inspired by the long-distance lexical event-related potential (N400). We evaluate this model in textual and speech recognition experiments, showing consistent improvements over 4-gram modified Kneser-Ney language models (Chen and Goodman, 1998) for large-scale textual datasets in English, Arabic, Croatian, and Hungarian.

[1]  Jack Mostow,et al.  Towards Using EEG to Improve ASR Accuracy , 2012, HLT-NAACL.

[2]  Robert D Conrad A maximum likelihood tracker , 1981 .

[3]  M. Kutas,et al.  Event-related brain potentials to grammatical errors and semantic anomalies , 1983, Memory & cognition.

[4]  Ciprian Chelba,et al.  Exploiting Syntactic Structure for Natural Language Modeling , 2000, ArXiv.

[5]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[6]  L. Shah,et al.  Functional magnetic resonance imaging. , 2010, Seminars in roentgenology.

[7]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[8]  Rene De La Briandais File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[9]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10]  Gary L. Dannenbring,et al.  Strategic factors in a lexical-decision task: Evidence for automatic and attention-driven processes , 1983, Memory & cognition.

[11]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[13]  A D Friederici,et al.  Processing relative clauses varying on syntactic and semantic dimensions: An analysis with event-related potentials , 1995, Memory & cognition.

[14]  Edward Gibson,et al.  A computational theory of human linguistic processing: memory limitations and processing breakdown , 1991 .

[15]  Tomaz Erjavec,et al.  hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene , 2011, TSD.

[16]  Gustavo Alonso,et al.  Temporal Structure , 2009, Encyclopedia of Database Systems.

[17]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[19]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[20]  Marvin Minsky,et al.  Polyscheme: a cognitive architecture for integrating multiple representation and inference schemes , 2002 .

[21]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[22]  John F. Connolly,et al.  CHAPTER 9 – Event-Related Potentials in the Study of Language , 2008 .

[23]  Madelena Lucinda McClure Event-related brain potentials elicited by Japanese sentences , 1999 .

[24]  Pat Langley,et al.  Interleaving Learning , Problem Solving , and Execution in the Icarus Architecture , 2022 .

[25]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[26]  P. Nunez,et al.  Electric fields of the brain , 1981 .

[27]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Stanley F. Chen,et al.  Shrinking Exponential Language Models , 2009, NAACL.

[29]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[30]  M. Kutas,et al.  Psycholinguistics Electrified II (1994–2005) , 2006 .

[31]  Angela D. Friederici,et al.  Brain potentials indicate immediate use of prosodic cues in natural speech processing , 1999, Nature Neuroscience.

[32]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[33]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[34]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[35]  M. Garrett,et al.  Syntactically Based Sentence Processing Classes: Evidence from Event-Related Brain Potentials , 1991, Journal of Cognitive Neuroscience.

[36]  Kamel Smaïli,et al.  Improving language models by using distant information , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[37]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[38]  M. Kutas,et al.  Reading senseless sentences: brain potentials reflect semantic incongruity. , 1980, Science.

[39]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[40]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[41]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[42]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[43]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[44]  Lee Osterhout,et al.  Constraints on Movement Phenomena in Sentence Processing: Evidence from Event-related Brain Potentials , 1996 .

[45]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[46]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[47]  A. Friederici Towards a neural basis of auditory sentence processing , 2002, Trends in Cognitive Sciences.

[48]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[49]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[50]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[51]  Franz Josef Och,et al.  An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[52]  A. Grinnell,et al.  Introduction to Nervous Systems , 1978 .

[53]  A. Friederici,et al.  Word category and verb–argument structure information in the dynamics of parsing , 2004, Cognition.

[54]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[55]  K. Heyer,et al.  On the nature of the proportion effect in semantic priming , 1985 .

[56]  J. Laidlaw,et al.  ANATOMY OF THE HUMAN BODY , 1967, The Ulster Medical Journal.

[57]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58]  廣瀬雄一,et al.  Neuroscience , 2019, Workplace Attachments.

[59]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[61]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[62]  Matthias Schlesewsky,et al.  The P600 as an indicator of syntactic ambiguity , 2002, Cognition.

[63]  M. Tanenhaus,et al.  Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[64]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[65]  Steffen Staab,et al.  A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing , 2014, ACL.

[66]  Ronald Rosenfeld,et al.  Adaptive Language Modeling Using the Maximum Entropy Principle , 1993, HLT.

[67]  Martin Meyer,et al.  Working memory constraints on syntactic ambiguity resolution as revealed by electrical brain responses , 1998, Biological Psychology.

[68]  Haizhou Li,et al.  Modeling of term-distance and term-occurrence information for improving n-gram language model performance , 2013, ACL.

[69]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[70]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[71]  Boicho N. Kokinov,et al.  The DUAL Cognitive Architecture: A Hybrid Multi-Agent Approach , 1994, ECAI.

[72]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[73]  Joshua Goodman,et al.  Putting it all together: language model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[74]  A. Damasio,et al.  Brain and language. , 1993, Scientific American.

[75]  Maryellen C. MacDonald,et al.  The lexical nature of syntactic ambiguity resolution , 1994 .

[76]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[77]  K. Stanovich,et al.  On priming by a sentence context. , 1983, Journal of experimental psychology. General.

[78]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[79]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[80]  C. Petten A comparison of lexical and sentence-level context effects in event-related potentials , 1993 .

[81]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[82]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[83]  H. Neville,et al.  Fractionating language: different neural subsystems with different sensitive periods. , 1992, Cerebral cortex.

[84]  M. Kutas,et al.  Influences of semantic and syntactic context on open- and closed-class words , 1991, Memory & cognition.

[85]  D. Fry,et al.  SPEECH AND LANGUAGE , 1986 .

[86]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[87]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[88]  P. Lewis Ethnologue : languages of the world , 2009 .

[89]  J. H. Neely Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. , 1977 .

[90]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[91]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[92]  Dietrich Klakow,et al.  Log-linear interpolation of language models , 1998, ICSLP.

[93]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[94]  Wei Xu,et al.  Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[95]  D. Caplan,et al.  Electrophysiological distinctions in processing conceptual relationships within simple sentences. , 2003, Brain research. Cognitive brain research.

[96]  A. Gamba,et al.  Further experiments with PAPA , 1961 .

[97]  Bernard Mérialdo,et al.  Natural Language Modeling for Phoneme-to-Text Transcription , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Javier Snaider,et al.  The LIDA Framework as a General Tool for AGI , 2011, AGI.

[99]  M. Kutas,et al.  Potentials and Paradigms: Event‐Related Brain Potentials and Neuropsychology , 2012 .

[100]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[101]  Karen A. Loveland,et al.  LARGE SCALE , 1991 .

[102]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[103]  Salim Roukos,et al.  Brain potentials related to stages of sentence verification. , 1983, Psychophysiology.

[104]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[105]  J G Daugman,et al.  Information Theory and Coding , 1998 .

[106]  Bhuvana Ramabhadran,et al.  Scaling shrinkage-based language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[107]  András Kornai,et al.  Creating Open Language Resources for Hungarian , 2004, LREC.

[108]  M. Slowiaczek,et al.  Constraints on semantic priming in reading: A fixation time analysis , 1986, Memory & cognition.

[109]  Brian Roark,et al.  Markov Parsing: Lattice Rescoring with a Statistical Parser , 2002, ACL.

[110]  C. Lebiere,et al.  The Atomic Components of Thought , 1998 .

[111]  C. Burgess,et al.  Lexical and Sentence Context Effects in Word Recognition , 1989 .

[112]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[113]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[114]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[115]  J. T. Marsh,et al.  Principal component analysis of ERP differences related to the meaning of an ambiguous word. , 1979, Electroencephalography and clinical neurophysiology.

[116]  A. Newell Unified Theories of Cognition , 1990 .

[117]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[118]  Jianfeng Gao,et al.  Unsupervised Learning of Dependency Structure for Language Modeling , 2003, ACL.

[119]  F. Gobet,et al.  The CHREST Architecture of Cognition: The Role of Perception in General Intelligence , 2010, AGI 2010.

[120]  David E. Kieras,et al.  An Overview of the EPIC Architecture for Cognition and Performance With Application to Human-Computer Interaction , 1997, Hum. Comput. Interact..

[121]  John D. Lafferty,et al.  Analysis, statistical transfer, and synthesis in machine translation , 1992, TMI.

[122]  Lyn Frazier,et al.  Is the human sentence parsing mechanism an ATN? , 1980, Cognition.