SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features

SentSpace is a modular framework for streamlined evaluation of text. SentSpacecharacterizes textual input using diverse lexical, syntactic, and semantic features derivedfrom corpora and psycholinguistic experiments. Core sentence features fall into three primaryfeature spaces: 1) Lexical, 2) Contextual, and 3) Embeddings. To aid in the analysis of computed features, SentSpace provides a web interface for interactive visualization and comparison with text from large corpora. The modular design of SentSpace allows researchersto easily integrate their own feature computation into the pipeline while benefiting from acommon framework for evaluation and visualization. In this manuscript we will describe thedesign of SentSpace, its core feature spaces, and demonstrate an example use case by comparing human-written and machine-generated (GPT2-XL) sentences to each other. We findthat while GPT2-XL-generated text appears fluent at the surface level, psycholinguistic normsand measures of syntactic processing reveal key differences between text produced by humansand machines. Thus, SentSpace provides a broad set of cognitively motivated linguisticfeatures for evaluation of text within natural language processing, cognitive science, as wellas the social sciences.

[1]  Evelina Fedorenko,et al.  Semantic projection recovers rich human knowledge of multiple object features from word embeddings , 2022, Nature Human Behaviour.

[2]  Omer Levy,et al.  Shared computational principles for language processing in humans and deep language models , 2022, Nature Neuroscience.

[3]  J. King,et al.  Brains and algorithms partially converge in natural language processing , 2022, Communications Biology.

[4]  Rutvik H. Desai,et al.  SCOPE: The South Carolina psycholinguistic metabase , 2021, Behavior Research Methods.

[5]  Ce Zhang,et al.  Multilingual Language Models Predict Human Reading Behavior , 2021, NAACL.

[6]  E. Gibson,et al.  Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex , 2021, The Journal of Neuroscience.

[7]  Matthew H. C. Mak,et al.  Evidence for preferential attachment: Words that are more well connected in semantic networks are better at acquiring new links in paired-associate learning , 2020, Psychonomic bulletin & review.

[8]  Eghbal A. Hosseini,et al.  The neural architecture of language: Integrative modeling converges on predictive processing , 2020, Proceedings of the National Academy of Sciences.

[9]  Alexander Löser,et al.  VisBERT: Hidden-State Visualizations for Transformers , 2020, WWW.

[10]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Teven Le Scao,et al.  Transformers: State-of-the-Art Natural Language Processing , 2019, EMNLP.

[12]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[13]  William Schuler,et al.  fMRI reveals language-specific predictive coding during naturalistic sentence comprehension , 2019, Neuropsychologia.

[14]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[15]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[16]  Penny M. Pexman,et al.  Quantifying sensorimotor experience: Body–object interaction ratings for more than 9,000 English words , 2018, Behavior Research Methods.

[17]  Amy Perfors,et al.  The “Small World of Words” English word association norms for over 12,000 cue words , 2018, Behavior Research Methods.

[18]  Samantha F. McCormick,et al.  Word prevalence norms for 62,000 English lemmas , 2018, Behavior Research Methods.

[19]  Saif Mohammad,et al.  Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words , 2018, ACL.

[20]  William Schuler,et al.  Left-Corner Parsing With Distributed Associative Memory Produces Surprisal and Locality Effects. , 2018, Cognitive science.

[21]  Marco Marelli,et al.  A database of orthography-semantics consistency (OSC) estimates for 15,017 English words , 2018, Behavior Research Methods.

[22]  Samar Husain,et al.  Quantifying sentence complexity based on eye-tracking measures , 2016, CL4LC@COLING 2016.

[23]  William Schuler,et al.  Memory access during incremental sentence processing causes reading time latency , 2016, CL4LC@COLING 2016.

[24]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[27]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[28]  William Schuler,et al.  A Model of Language Processing as Hierarchic Sequential Prediction , 2013, Top. Cogn. Sci..

[29]  T. Rogers,et al.  Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words , 2012, Behavior Research Methods.

[30]  William Schuler,et al.  Accurate Unbounded Dependency Recovery using Generalized Categorial Grammars , 2012, COLING.

[31]  M. Brysbaert,et al.  Adding part-of-speech information to the SUBTLEX-US word frequencies , 2012, Behavior Research Methods.

[32]  S. Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[33]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[34]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[35]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[36]  Edward Gibson,et al.  Consequences of the Serial Nature of Linguistic Input for Sentenial Complexity , 2005, Cogn. Sci..

[37]  Elizabeth A. Kensinger,et al.  Memory enhancement for emotional words: Are emotional words more vividly remembered than neutral words? , 2003, Memory & cognition.

[38]  M. Pickering,et al.  Processing ambiguous verbs: evidence from eye movements. , 2001, Journal of experimental psychology. Learning, memory, and cognition.

[39]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[40]  J. Danion,et al.  Affective valence of words, explicit and implicit memory in clinical depression. , 1995, Journal of affective disorders.

[41]  W. Levelt,et al.  Word frequency effects in speech production: Retrieval of syntactic information and of phonological form , 1994 .

[42]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[43]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[44]  K. Rayner,et al.  Resolution of syntactic category ambiguities: Eye movements in parsing lexically ambiguous sentences☆ , 1987 .

[45]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[46]  Marcel Kinsbourne,et al.  The mechanism of the word-frequency effect on recognition memory , 1974 .

[47]  Tiago Pimentel,et al.  Typical Decoding for Natural Language Generation , 2022, ArXiv.

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[50]  Roger Levy,et al.  Memory and surprisal in human sentence comprehension , 2013 .

[51]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[52]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30 thousand English words , 2012 .

[53]  Mark Davies The 385+ million word Corpus of Contemporary American English (1990―2008+): Design, architecture, and linguistic insights , 2009 .

[54]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[55]  D C Rubin,et al.  Predicting which words get recalled: Measures of free recall, availability, goodness, emotionality, and pronunciability for 925 nouns , 1986, Memory & cognition.

[56]  A. Gorman Recognition memory for nouns as a function of abstractness and frequency. , 1961, Journal of experimental psychology.