Mined semantic analysis: A new concept space model for semantic representation of textual data

Mined Semantic Analysis (MSA) is a novel concept space model which employs unsupervised learning to generate semantic representations of text. MSA represents textual structures (terms, phrases, documents) as a Bag of Concepts (BoC) where concepts are derived from concept rich encyclopedic corpora. Traditional concept space models exploit only target corpus content to construct the concept space. MSA, alternatively, uncovers implicit relations between concepts by mining for their associations (e.g., mining Wikipedia's “See also” link graph). We evaluate MSA's performance on benchmark datasets for measuring semantic relatedness of words and sentences. Empirical results show competitive performance of MSA compared to prior state-of-the-art methods. Additionally, we introduce the first analytical study to examine statistical significance of results reported by different semantic relatedness methods. Our study shows that, the nuances of results across top performing methods could be statistically insignificant. The study positions MSA as one of state-of-the-art methods for measuring semantic relatedness, besides the inherent interpretability and simplicity of the generated semantic representation.

[1]  Dan Roth,et al.  On Dataless Hierarchical Text Classification , 2014, AAAI.

[2]  Haixun Wang,et al.  Understanding Short Texts , 2013, APWeb.

[3]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[4]  Dongwoo Kim,et al.  Context-Dependent Conceptualization , 2013, IJCAI.

[5]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[6]  Vasile Rus,et al.  Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods , 2015, CICLing.

[7]  Roberto Navigli,et al.  From senses to texts: An all-in-one graph-based approach for measuring semantic similarity , 2015, Artif. Intell..

[8]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[9]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[10]  Thomas Hill Statistics: Methods and Applications , 2005 .

[11]  Dan Roth,et al.  Unsupervised Sparse Vector Densification for Short Text Similarity , 2015, NAACL.

[12]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[13]  Haixun Wang,et al.  Short text understanding through lexical-semantic analysis , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[15]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[16]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[17]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Yoshua Bengio,et al.  Not All Neural Embeddings are Born Equal , 2014, ArXiv.

[20]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[21]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[22]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[23]  Roberto Navigli,et al.  NASARI: a Novel Approach to a Semantically-Aware Representation of Items , 2015, NAACL.

[24]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[25]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[26]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Xiaofeng Meng,et al.  Query Understanding through Knowledge-Based Conceptualization , 2015, IJCAI.

[29]  Haixun Wang,et al.  Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach , 2015, IJCAI.

[30]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[31]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[32]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[33]  Xindong Wu,et al.  Computing term similarity by large probabilistic isA knowledge , 2013, CIKM.

[34]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[35]  Edward D Rothman,et al.  Statistics, methods and applications , 1987 .

[36]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[37]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[38]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[39]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[40]  W. Beyer CRC Standard Probability And Statistics Tables and Formulae , 1990 .

[41]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[42]  Wlodek Zadrozny,et al.  Innovation Analytics Using Mined Semantic Analysis , 2016, FLAIRS Conference.