论文信息 - A Neurophysiologically-Inspired Statistical Language Model

A Neurophysiologically-Inspired Statistical Language Model

We describe a statistical language model having components that are inspired by electrophysiological activities in the brain. These components correspond to important language-relevant event-related potentials measured using electroencephalography. We relate neural signals involved in localand long-distance grammatical processing, as well as localand long-distance lexical processing to statistical language models that are scalable, cross-linguistic, and incremental. We develop a novel language model component that unifies n-gram, skip, and trigger language models into a generalized model inspired by the long-distance lexical event-related potential (N400). We evaluate this model in textual and speech recognition experiments, showing consistent improvements over 4-gram modified Kneser-Ney language models (Chen and Goodman, 1998) for large-scale textual datasets in English, Arabic, Croatian, and Hungarian.

Jonathan Dehdari | Jon Dehdari

[1] Jack Mostow,et al. Towards Using EEG to Improve ASR Accuracy , 2012, HLT-NAACL.

[2] Robert D Conrad. A maximum likelihood tracker , 1981 .

[3] M. Kutas,et al. Event-related brain potentials to grammatical errors and semantic anomalies , 1983, Memory & cognition.

[4] Ciprian Chelba,et al. Exploiting Syntactic Structure for Natural Language Modeling , 2000, ArXiv.

[5] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[6] L. Shah,et al. Functional magnetic resonance imaging. , 2010, Seminars in roentgenology.

[7] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[8] Rene De La Briandais. File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[9] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10] Gary L. Dannenbring,et al. Strategic factors in a lexical-decision task: Evidence for automatic and attention-driven processes , 1983, Memory & cognition.

[11] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12] Lukás Burget,et al. The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[13] A D Friederici,et al. Processing relative clauses varying on syntactic and semantic dimensions: An analysis with event-related potentials , 1995, Memory & cognition.

[14] Edward Gibson,et al. A computational theory of human linguistic processing: memory limitations and processing breakdown , 1991 .

[15] Tomaz Erjavec,et al. hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene , 2011, TSD.

[16] Gustavo Alonso,et al. Temporal Structure , 2009, Encyclopedia of Database Systems.

[17] Ronald Rosenfeld,et al. Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[19] H. Kucera,et al. Computational analysis of present-day American English , 1967 .

[20] Marvin Minsky,et al. Polyscheme: a cognitive architecture for integrating multiple representation and inference schemes , 2002 .

[21] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[22] John F. Connolly,et al. CHAPTER 9 – Event-Related Potentials in the Study of Language , 2008 .

[23] Madelena Lucinda McClure. Event-related brain potentials elicited by Japanese sentences , 1999 .

[24] Pat Langley,et al. Interleaving Learning , Problem Solving , and Execution in the Icarus Architecture , 2022 .

[25] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[26] P. Nunez,et al. Electric fields of the brain , 1981 .

[27] Reinhard Kneser,et al. On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28] Stanley F. Chen,et al. Shrinking Exponential Language Models , 2009, NAACL.

[29] Thomas Hofmann,et al. Topic-based language models using EM , 1999, EUROSPEECH.

[30] M. Kutas,et al. Psycholinguistics Electrified II (1994–2005) , 2006 .

[31] Angela D. Friederici,et al. Brain potentials indicate immediate use of prosodic cues in natural speech processing , 1999, Nature Neuroscience.

[32] Holger Schwenk,et al. Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[33] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[34] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[35] M. Garrett,et al. Syntactically Based Sentence Processing Classes: Evidence from Event-Related Brain Potentials , 1991, Journal of Cognitive Neuroscience.

[36] Kamel Smaïli,et al. Improving language models by using distant information , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[37] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[38] M. Kutas,et al. Reading senseless sentences: brain potentials reflect semantic incongruity. , 1980, Science.

[39] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[40] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[41] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[42] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[43] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[44] Lee Osterhout,et al. Constraints on Movement Phenomena in Sentence Processing: Evidence from Event-related Brain Potentials , 1996 .

[45] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[46] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[47] A. Friederici. Towards a neural basis of auditory sentence processing , 2002, Trends in Cognitive Sciences.

[48] Risto Miikkulainen,et al. Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[49] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[50] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[51] Franz Josef Och,et al. An Efficient Method for Determining Bilingual Word Classes , 1999, EACL.

[52] A. Grinnell,et al. Introduction to Nervous Systems , 1978 .

[53] A. Friederici,et al. Word category and verb–argument structure information in the dynamics of parsing , 2004, Cognition.

[54] Alexander Clark,et al. Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[55] K. Heyer,et al. On the nature of the proportion effect in semantic priming , 1985 .

[56] J. Laidlaw,et al. ANATOMY OF THE HUMAN BODY , 1967, The Ulster Medical Journal.

[57] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58] 廣瀬雄一,et al. Neuroscience , 2019, Workplace Attachments.

[59] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[60] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[61] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[62] Matthias Schlesewsky,et al. The P600 as an indicator of syntactic ambiguity , 2002, Cognition.

[63] M. Tanenhaus,et al. Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[64] Wray L. Buntine. Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[65] Steffen Staab,et al. A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing , 2014, ACL.

[66] Ronald Rosenfeld,et al. Adaptive Language Modeling Using the Maximum Entropy Principle , 1993, HLT.

[67] Martin Meyer,et al. Working memory constraints on syntactic ambiguity resolution as revealed by electrical brain responses , 1998, Biological Psychology.

[68] Haizhou Li,et al. Modeling of term-distance and term-occurrence information for improving n-gram language model performance , 2013, ACL.

[69] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[70] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[71] Boicho N. Kokinov,et al. The DUAL Cognitive Architecture: A Hybrid Multi-Agent Approach , 1994, ECAI.

[72] E. Gibson. Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[73] Joshua Goodman,et al. Putting it all together: language model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[74] A. Damasio,et al. Brain and language. , 1993, Scientific American.

[75] Maryellen C. MacDonald,et al. The lexical nature of syntactic ambiguity resolution , 1994 .

[76] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .

[77] K. Stanovich,et al. On priming by a sentence context. , 1983, Journal of experimental psychology. General.

[78] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[79] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[80] C. Petten. A comparison of lexical and sentence-level context effects in event-related potentials , 1993 .

[81] Frederick Jelinek,et al. Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[82] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[83] H. Neville,et al. Fractionating language: different neural subsystems with different sensitive periods. , 1992, Cerebral cortex.

[84] M. Kutas,et al. Influences of semantic and syntactic context on open- and closed-class words , 1991, Memory & cognition.

[85] D. Fry,et al. SPEECH AND LANGUAGE , 1986 .

[86] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[87] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[88] P. Lewis. Ethnologue : languages of the world , 2009 .

[89] J. H. Neely. Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. , 1977 .

[90] Stanley F. Chen,et al. Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[91] Dietrich Klakow,et al. Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[92] Dietrich Klakow,et al. Log-linear interpolation of language models , 1998, ICSLP.

[93] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[94] Wei Xu,et al. Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[95] D. Caplan,et al. Electrophysiological distinctions in processing conceptual relationships within simple sentences. , 2003, Brain research. Cognitive brain research.

[96] A. Gamba,et al. Further experiments with PAPA , 1961 .

[97] Bernard Mérialdo,et al. Natural Language Modeling for Phoneme-to-Text Transcription , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98] Javier Snaider,et al. The LIDA Framework as a General Tool for AGI , 2011, AGI.

[99] M. Kutas,et al. Potentials and Paradigms: Event‐Related Brain Potentials and Neuropsychology , 2012 .

[100] Yorick Wilks,et al. A Closer Look at Skip-gram Modelling , 2006, LREC.

[101] Karen A. Loveland,et al. LARGE SCALE , 1991 .

[102] Samy Bengio,et al. Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[103] Salim Roukos,et al. Brain potentials related to stages of sentence verification. , 1983, Psychophysiology.

[104] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[105] J G Daugman,et al. Information Theory and Coding , 1998 .

[106] Bhuvana Ramabhadran,et al. Scaling shrinkage-based language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[107] András Kornai,et al. Creating Open Language Resources for Hungarian , 2004, LREC.

[108] M. Slowiaczek,et al. Constraints on semantic priming in reading: A fixation time analysis , 1986, Memory & cognition.

[109] Brian Roark,et al. Markov Parsing: Lattice Rescoring with a Statistical Parser , 2002, ACL.

[110] C. Lebiere,et al. The Atomic Components of Thought , 1998 .

[111] C. Burgess,et al. Lexical and Sentence Context Effects in Word Recognition , 1989 .

[112] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[113] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[114] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[115] J. T. Marsh,et al. Principal component analysis of ERP differences related to the meaning of an ambiguous word. , 1979, Electroencephalography and clinical neurophysiology.

[116] A. Newell. Unified Theories of Cognition , 1990 .

[117] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[118] Jianfeng Gao,et al. Unsupervised Learning of Dependency Structure for Language Modeling , 2003, ACL.

[119] F. Gobet,et al. The CHREST Architecture of Cognition: The Role of Perception in General Intelligence , 2010, AGI 2010.

[120] David E. Kieras,et al. An Overview of the EPIC Architecture for Cognition and Performance With Application to Human-Computer Interaction , 1997, Hum. Comput. Interact..

[121] John D. Lafferty,et al. Analysis, statistical transfer, and synthesis in machine translation , 1992, TMI.

[122] Lyn Frazier,et al. Is the human sentence parsing mechanism an ATN? , 1980, Cognition.