An Inference Approach to Basic Level of Categorization

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications such as knowledge panel, advertising and recommendation. However, how to quantify basic-level concepts is still an open problem. Recently, much work focuses on constructing knowledge bases or semantic networks from web scale text corpora, which makes it possible for the first time to analyze computational approaches for deriving BLC. In this paper, we introduce a method based on typicality and PMI for BLC. We compare it with a few existing measures such as NPMI and commute time to understand its essence, and conduct extensive experiments to show the effectiveness of our approach. We also give a real application example to show how BLC can help sponsored search.

[1]  Dongwoo Kim,et al.  Context-Dependent Conceptualization , 2013, IJCAI.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[4]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[5]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[7]  Haixun Wang,et al.  Short text understanding through lexical-semantic analysis , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[9]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[10]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[11]  Seung-won Hwang,et al.  Attribute extraction and scoring: A probabilistic approach , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Zhirui Hu,et al.  Head, modifier, and constraint detection in short texts , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[15]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[16]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[17]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[18]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[19]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[20]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[21]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[24]  Zhoujun Li,et al.  Concept-based Short Text Classification and Ranking , 2014, CIKM.

[25]  L. Barsalou Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. , 1985, Journal of experimental psychology. Learning, memory, and cognition.

[26]  E. Rosch,et al.  Relationships among goodness-of-example, category norms, and word frequency , 1976 .

[27]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[30]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[31]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[32]  Xiaofeng Meng,et al.  Query Understanding through Knowledge-Based Conceptualization , 2015, IJCAI.

[33]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .