Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

In many applications, it is important to characterize the way in which two concepts are semantically related. Knowledge graphs such as ConceptNet provide a rich source of information for such characterizations by encoding relations between concepts as edges in a graph. When two concepts are not directly connected by an edge, their relationship can still be described in terms of the paths that connect them. Unfortunately, many of these paths are uninformative and noisy, which means that the success of applications that use such path features crucially relies on their ability to select high-quality paths. In existing applications, this path selection process is based on relatively simple heuristics. In this paper we instead propose to learn to predict path quality from crowdsourced human assessments. Since we are interested in a generic task-independent notion of quality, we simply ask human participants to rank paths according to their subjective assessment of the paths' naturalness, without attempting to define naturalness or steering the participants towards particular indicators of quality. We show that a neural network model trained on these assessments is able to predict human judgments on unseen paths with near optimal performance. Most notably, we find that the resulting path selection method is substantially better than the current heuristic approaches at identifying meaningful paths.

[1]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[4]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[5]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[6]  Lora Aroyo,et al.  The Three Sides of CrowdTruth , 2014, Hum. Comput..

[7]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[8]  G. Āllport The Psycho-Biology of Language. , 1936 .

[9]  Hoifung Poon,et al.  Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text , 2016, ACL.

[10]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[11]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[12]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[13]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[16]  Huanbo Luan,et al.  Modeling Relation Paths for Representation Learning of Knowledge Bases , 2015, EMNLP.

[17]  Manuel Blum,et al.  Verbosity: a game for collecting common-sense facts , 2006, CHI.

[18]  Steven Schockaert,et al.  Relation Induction in Word Embeddings Revisited , 2018, COLING.

[19]  Junpeng Chen,et al.  Combining ConceptNet and WordNet for Word Sense Disambiguation , 2011, IJCNLP.

[20]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[21]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  ChengXiang Zhai,et al.  Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries , 2012, WSDM '12.

[24]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[27]  Andrew McCallum,et al.  Compositional Vector Space Models for Knowledge Base Completion , 2015, ACL.

[28]  Rajarshi Das,et al.  Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks , 2016, EACL.

[29]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[31]  Alessandro Lenci,et al.  SemEval-2018 Task 10: Capturing Discriminative Attributes , 2018, *SEMEVAL.

[32]  Tom M. Mitchell,et al.  Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases , 2014, EMNLP.

[33]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[34]  Sonia Chernova,et al.  Solving and Explaining Analogy Questions Using Semantic Networks , 2015, AAAI.

[35]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[36]  Wenhan Xiong,et al.  DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.

[37]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[38]  Ellen M. Voorhees,et al.  The TREC robust retrieval track , 2005, SIGF.