Opinion subset selection via submodular maximization

Abstract Current research on subset selection for opinion analysis assumes that their methods can retrieve the opinions expressed in documents from general text features. However, such relaxed conditions can hardly maintain the performance of the analysis in opinion mining , especially when given strict limitations on the subset size. In this paper, we propose a framework for opinion subset selection. This framework can select a small set of instances from original data to convey a subjective representation for opinion classification and regression. Compared with our framework, the conventional submodular based subset selection approach cannot capture the fine-grained opinion features expressed in the corpus. Specifically, we propose a monotone non-decreasing score function and a framework based on topic modeling and submodular maximization for filtering irrelevant information and selecting the subsets. Our work further introduces an opinion-sensitive algorithm for optimizing the proposed function for opinion subset construction . We perform extensive experiments and comparative analysis of different subset selection methods in this work. The experimental result shows that the proposed opinion subset selection framework can compress the original text training set and preserve the test set’s classification and regression metric performance at the same time.

[1]  Mohamed H. Haggag,et al.  A survey on opinion summarization techniques for social media , 2018, Future Computing and Informatics Journal.

[2]  Luis Argerich,et al.  Variations of the Similarity Function of TextRank for Automated Summarization , 2016, ArXiv.

[3]  Erik Cambria,et al.  Sentic patterns: Dependency-based rules for concept-level sentiment analysis , 2014, Knowl. Based Syst..

[4]  Claudiu Musat,et al.  Submodularity-inspired Data Selection for Goal-oriented Chatbot Training based on Sentence Embeddings , 2018, IJCAI.

[5]  Narendra Ahuja,et al.  Coreset-Based Neural Network Compression , 2018, ECCV.

[6]  Baharan Mirzasoleiman,et al.  Fast Constrained Submodular Maximization: Personalized Data Summarization , 2016, ICML.

[7]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[8]  Bayu Distiawan Trisedya,et al.  Stock price prediction using linear regression based on sentiment analysis , 2015, 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[9]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Wenchao Xu,et al.  Aspect based fine-grained sentiment analysis for online reviews , 2019, Inf. Sci..

[11]  Dan Ventura,et al.  Sentiment Regression: Using Real-Valued Scores to Summarize Overall Document Sentiment , 2008, 2008 IEEE International Conference on Semantic Computing.

[12]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[13]  T. T. Mirnalinee,et al.  SSN_MLRG1 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis Using Multiple Kernel Gaussian Process Regression Model , 2017, *SEMEVAL.

[14]  Ruken Cakici,et al.  Effect of Using Regression on Class Confidence Scores in Sentiment Analysis of Twitter Data , 2014, WASSA@ACL.

[15]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[16]  Hadrien Van Lierde,et al.  Learning with fuzzy hypergraphs: A topical approach to query-oriented text summarization , 2019, Inf. Sci..

[17]  Shigeo Abe Feature Selection and Extraction , 2010 .

[18]  Pushpak Bhattacharyya,et al.  Monotone Submodularity in Opinion Summaries , 2015, EMNLP.

[19]  Jeff A. Bilmes,et al.  Submodularity for Data Selection in Machine Translation , 2014, EMNLP.

[20]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[23]  Erik Cambria,et al.  SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings , 2018, AAAI.

[24]  Shafiq R. Joty,et al.  Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings , 2015, EMNLP.

[25]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[26]  Ahmed K. Elmagarmid,et al.  Active Learning With Optimal Instance Subset Selection , 2013, IEEE Transactions on Cybernetics.

[27]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[28]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[29]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[30]  Alan Kuhnle Interlaced Greedy Algorithm for Maximization of Submodular Functions in Nearly Linear Time , 2019, NeurIPS.

[31]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[32]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[33]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[34]  Vincenzo Loia,et al.  Context-aware profiling of concepts from a semantic topological space , 2017, Knowl. Based Syst..

[35]  Piotr Indyk,et al.  Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm , 2019, ICML.

[36]  Xiaojun Wan,et al.  Automatic Labeling of Topic Models Using Text Summaries , 2016, ACL.

[37]  Georgios Balikas,et al.  Multitask Learning for Fine-Grained Twitter Sentiment Analysis , 2017, SIGIR.

[38]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[39]  Andreas Krause,et al.  Scalable k -Means Clustering via Lightweight Coresets , 2017, KDD.

[40]  Oussama Rouane,et al.  Combine clustering and frequent itemsets mining to enhance biomedical text summarization , 2019, Expert Syst. Appl..

[41]  Hiroya Takamura,et al.  Subtree Extractive Summarization via Submodular Maximization , 2013, ACL.

[42]  Yusuke Shinohara A submodular optimization approach to sentence set selection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[44]  Dan Feldman,et al.  Dimensionality Reduction of Massive Sparse Datasets Using Coresets , 2015, NIPS.

[45]  Manuel Montes-y-Gómez,et al.  Detecting Depression in Social Media using Fine-Grained Emotions , 2019, NAACL.

[46]  Yan Zheng,et al.  Coresets for Kernel Regression , 2017, KDD.

[47]  Mirella Lapata,et al.  Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis , 2017, TACL.

[48]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.