Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication.

[1]  German Rigau,et al.  Supervised Corpus-based Methods for Word Sense Disambiguation , 2006 .

[2]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[5]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  Stephen J. Crowley,et al.  Philosophical intervention and cross-disciplinary science: the story of the Toolbox Project , 2013, Synthese.

[8]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[9]  R. Naiman A Perspective on Interdisciplinary Science , 1999, Ecosystems.

[10]  R. Harvey,et al.  Biofilms and chronic rhinosinusitis: systematic review of evidence, current concepts and directions for research. , 2007, Rhinology.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  R. Ankeny,et al.  What’s so special about model organisms? , 2011 .

[13]  Louise J. Bracken,et al.  ‘What do you mean?’ The importance of language in developing interdisciplinary research , 2006 .

[14]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[15]  Ted Pedersen,et al.  Unsupervised Corpus-Based Methods for WSD , 2007 .

[16]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[17]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[20]  Christina Lutter Comparative Approaches to Visions of Community , 2015 .

[21]  Els Lefever,et al.  TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment. , 2013 .

[22]  Ted Pedersen Duluth: Word Sense Discrimination in the Service of Lexicography , 2015, SemEval@NAACL-HLT.

[23]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[24]  S. D. Cooper,et al.  How to Avoid Train Wrecks When Using Science in Environmental Problem Solving , 2002 .

[25]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[26]  Igor Douven The Formal Epistemology Project , 2012, Synthese.

[27]  J. L. Thompson,et al.  Building Collective Communication Competence in Interdisciplinary Research Teams , 2009 .

[28]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[29]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[30]  Martine De Cock,et al.  ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation , 2011, ACL.

[31]  Denis Serre Matrix Factorizations and Their Applications , 2010 .

[32]  Julie Thompson Klein,et al.  Crossing Boundaries: Knowledge, Disciplinarities, and Interdisciplinarities , 1996 .

[33]  Els Lefever,et al.  LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit , 2013, CLIN 2013.

[34]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[35]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[36]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[37]  Mitchell Pavao-Zuckerman,et al.  Conceptual Models as Tools for Communication Across Disciplines , 2003 .

[38]  D. Vick Interdisciplinarity and the Discipline of Law , 2004 .

[39]  Julie Mennes SenseDisclosure: A new procedure for dealing with problematically ambiguous terms in cross-disciplinary communication , 2018 .

[40]  H. Nijhout,et al.  Mathematical models of folate-mediated one-carbon metabolism. , 2008, Vitamins and hormones.

[41]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[42]  Marianna Apidianaki,et al.  Latent Semantic Word Sense Induction and Disambiguation , 2011, ACL.

[43]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[44]  K. Robert Lai,et al.  Refining Word Embeddings for Sentiment Analysis , 2017, EMNLP.

[45]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.