Commonsense-based topic modeling

Topic modeling is a technique used for discovering the abstract 'topics' that occur in a collection of documents, which is useful for tasks such as text auto-categorization and opinion mining. In this paper, a commonsense knowledge based algorithm for document topic modeling is presented. In contrast to probabilistic models, the proposed approach does not involve training of any kind and does not depend on word co-occurrence or particular word distributions, making the algorithm effective on texts of any length and composition. 'Semantic atoms' are used to generate feature vectors for document concepts. These features are then clustered using group average agglomerative clustering, providing much improved performance over existing algorithms.

[1]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[2]  Erik Cambria,et al.  A graph-based approach to commonsense concept extraction and semantic similarity detection , 2013, WWW.

[3]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[4]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[5]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[6]  Erik Cambria,et al.  Intention awareness: improving upon situation awareness in human-centric environments , 2013, Human-centric Computing and Information Sciences.

[7]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[8]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[9]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[10]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[11]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[12]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[13]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[14]  Daniel J. Olsher COGVIEW & INTELNET: Nuanced energy-based knowledge representation and integrated cognitive-conceptual framework for realistic culture, values, and concept-affected systems simulation , 2013, 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI).

[15]  Henry Lieberman,et al.  Beating Common Sense into Interactive Applications , 2004, AI Mag..

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[18]  Daniel J. Olsher COGPARSE: Brain-Inspired Knowledge-Driven Full Semantics Parsing - Radical Construction Grammar, Categories, Knowledge-Based Parsing & Representation , 2012, BICS.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[21]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[24]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[25]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[28]  Francesco Piazza,et al.  Sentic Web: A New Paradigm for Managing Social Media Affective Information , 2011, Cognitive Computation.

[29]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[30]  Erik Cambria,et al.  Common Sense Knowledge for Handwritten Chinese Text Recognition , 2013, Cognitive Computation.

[31]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[32]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[33]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[36]  Push Singh,et al.  Common Sense Conversations: Understanding Casual Conversation using a Common Sense Database , 2003 .

[37]  Erik Cambria,et al.  Sentic Computing: Techniques, Tools, and Applications , 2012 .