Challenging distributional models with a conceptual network of philosophical terms

Computational linguistic research on language change through distributional semantic (DS) models has inspired researchers from fields such as philosophy and literary studies, who use these methods for the exploration and comparison of comparatively small datasets traditionally analyzed by close reading. Research on methods for small data is still in early stages and it is not clear which methods achieve the best results. We investigate the possibilities and limitations of using distributional semantic models for analyzing philosophical data by means of a realistic use-case. We provide a ground truth for evaluation created by philosophy experts and a blueprint for using DS models in a sound methodological setup. We compare three methods for creating specialized models from small datasets. Though the models do not perform well enough to directly support philosophers yet, we find that models designed for small data yield promising directions for future work.

[1]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[2]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[3]  Antske Fokkens,et al.  Evaluating the Consistency of Word Embeddings from Small Data , 2019, RANLP.

[4]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[5]  Michael N. Jones,et al.  Comparing Predictive and Co-occurrence Based Models of Lexical Semantics Trained on Child-directed Speech , 2016, CogSci.

[6]  Amir Bakarov,et al.  A Survey of Word Embeddings Evaluation Methods , 2018, ArXiv.

[7]  Shimei Pan,et al.  Incorporating Domain Knowledge in Learning Word Embedding , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[8]  Mario Giulianelli,et al.  Analysing Lexical Semantic Change with Contextualised Word Representations , 2020, ACL.

[9]  Stefan Schlobach,et al.  Phil@Scale: Computational Methods within Philosophy , 2013, DHLU.

[10]  Marijn Koolen,et al.  Digital begriffsgeschichte: Tracing semantic change using word embeddings , 2020 .

[11]  Udo Hahn,et al.  Bad Company—Neighborhoods in Neural Embedding Spaces Considered Harmful , 2016, COLING.

[12]  Antske Fokkens,et al.  Conceptual Change and Distributional Semantic Models: an Exploratory Study on Pitfalls and Possibilities , 2019, LChange@ACL.

[13]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[14]  Udo Hahn,et al.  An Assessment of Experimental Protocols for Tracing Changes in Word Semantics Relative to Accuracy and Reliability , 2016, LaTeCH@ACL.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Dmitry I. Ilvovsky,et al.  Creation and Evaluation of Datasets for Distributional Semantics Tasks in the Digital Humanities Domain , 2019, ArXiv.

[17]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[18]  Barbara McGillivray,et al.  Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings , 2019, EMNLP.

[19]  Anna Gladkova,et al.  Intrinsic Evaluations of Word Embeddings: What Can We Do Better? , 2016, RepEval@ACL.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[22]  Lars Borin,et al.  Survey of Computational Approaches to Diachronic Conceptual Change , 2018, ArXiv.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[25]  A. Betti,et al.  Modelling the History of Ideas , 2014 .

[26]  Alexandre Allauzen,et al.  Empirical Study of Diachronic Word Embeddings for Scarce Data , 2019, RANLP.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[29]  Sean C. Murphy,et al.  Evaluation of Semantic Change of Harm-Related Concepts in Psychology , 2019, LChange@ACL.

[30]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[31]  Bettina Speckmann,et al.  A philosophical perspective on visualization for digital humanities , 2018 .

[32]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[33]  Marco Baroni,et al.  High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.

[34]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[35]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[36]  Alessandro Lenci,et al.  The Effects of Data Size and Frequency Range on Distributional Semantic Models , 2016, EMNLP.

[37]  Jian Huang,et al.  Analyzing Multiple Medical Corpora Using Word Embedding , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[38]  Thomas Risse,et al.  On the Uses of Word Sense Change for Research in the Digital Humanities , 2017, TPDL.

[39]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[40]  Aurélie Herbelot,et al.  Distributional techniques for philosophical enquiry , 2012, LaTeCH@EACL.

[41]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[42]  Antske Fokkens,et al.  A larger-scale evaluation resource of terms and their shift direction for diachronic lexical semantics , 2019, NODALIDA.

[43]  Angeliki Lazaridou,et al.  Multimodal Word Meaning Induction From Minimal Exposure to Natural Text. , 2017, Cognitive science.

[44]  Arianna Betti,et al.  Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains , 2020, COLING.