A Natural Language Process-Based Framework for Automatic Association Word Extraction

Word association, revealing mental representations and connections of human, has been widely studied in psychology. However, the scale of available associative cue-response words is severely restricted due to the traditional manually collecting methodology. Meanwhile, with the tremendous success in Natural Language Process (NLP) tasks, an extremely large amount of plain texts can be easily acquired. This suggests an insight about the potential to find association words automatically from the text corpus instead of manually collection. As an original attempt, this paper takes a small step toward proposing a deep learning based framework for automatic association word extraction. The framework mainly consists of two stages of association word detection and machine association network construction. In particular, attention mechanism based Reading Comprehension (RC) algorithm is explored to find valuable association words automatically. To validate the value of the extracted association words, the correlation coefficient between semantic similarities of machine and human association words is introduced as an effective measurement for evaluating association consistence. The experiments are conducted on two text datasets from which together about $20k$ association words, more than the existing largest human association word dataset, are finally derived. The experiment further verifies that the machine association words are generally consistent with human association words with respect to semantic similarity, which highlights the promising utilization of the machine association words in the future researches of both psychology and NLP.

[1]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[2]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[3]  S. Mednick The associative basis of the creative process. , 1962, Psychological review.

[4]  Rada Mihalcea,et al.  Measuring semantic relatedness using salient encyclopedic concepts , 2011 .

[5]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Andrei Popescu-Belis,et al.  Human versus Machine Attention in Document Classification: A Dataset with Crowdsourced Annotations , 2016, SocialNLP@EMNLP.

[8]  David A. Balota,et al.  The semantic priming project , 2013, Behavior Research Methods.

[9]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[10]  Amy Perfors,et al.  The “Small World of Words” English word association norms for over 12,000 cue words , 2018, Behavior Research Methods.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  S. Klein,et al.  Learning: Principles and Applications , 1987 .

[13]  Amy Perfors,et al.  Structure at every scale: A semantic network account of the similarities between unrelated concepts. , 2016, Journal of experimental psychology. General.

[14]  J. Deese The structure of associations in language and thought , 1966 .

[15]  Thad Hughes,et al.  Lexical Semantic Relatedness with Random Graph Walks , 2007, EMNLP.

[16]  Regina Barzilay,et al.  Deriving Machine Attention from Human Rationales , 2018, EMNLP.

[17]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[18]  Furu Wei,et al.  Hierarchical Attention Flow for Multiple-Choice Reading Comprehension , 2018, AAAI.

[19]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[20]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[21]  Ping Li,et al.  Disentangling narrow and coarse semantic networks in the brain: The role of computational models of word meaning , 2017, Behavior research methods.

[22]  Carina Silberer,et al.  Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.

[23]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[24]  Xiaoli Z. Fern,et al.  Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[25]  Gabriel Recchia,et al.  More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis , 2009, Behavior research methods.

[26]  Xiaoli Z. Fern,et al.  DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference , 2018, NAACL.

[27]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[28]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[29]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[30]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[31]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[32]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[33]  Amy Perfors,et al.  Predicting human similarity judgments with distributional models: The value of word associations. , 2016, COLING.

[34]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[37]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[38]  James J. Jenkins,et al.  THE 1952 MINNESOTA WORD ASSOCIATION NORMS , 1970 .

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  S. Mednick,et al.  The Remote Associates Test , 1968 .