Exploring Social Context for Topic Identification in Short and Noisy Texts

With the pervasion of social media, topic identification in short texts attracts increasing attention in recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.

[1]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[2]  Daniela Godoy,et al.  Mining interests for user profiling in electronic conversations , 2013, Expert Syst. Appl..

[3]  Ryota Tomioka,et al.  Discovering Emerging Topics in Social Streams via Link-Anomaly Detection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Cosma Rohilla Shalizi,et al.  Homophily and Contagion Are Generically Confounded in Observational Social Network Studies , 2010, Sociological methods & research.

[5]  K. Annakkili,et al.  DISCOVERING EMERGING TOPICS IN SOCIAL STREAMS VIA LINK-ANOMALY DETECTION , 2015 .

[6]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[7]  Pericles A. Mitkas,et al.  Event identification in web social media through named entity recognition and topic modeling , 2013, Data Knowl. Eng..

[8]  Carl Lagoze,et al.  RESLVE: leveraging user interest to improve entity disambiguation on short text , 2013, WWW.

[9]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[10]  Sheng Wang,et al.  SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis , 2014, AAAI.

[11]  F. Sebastiani,et al.  Feature Selection for Ordinal Text Classification 1 , 2022 .

[12]  Susan Gauch,et al.  ChatTrack: Chat Room Topic Detection Using Classification , 2004, ISI.

[13]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xiaoming Zhang,et al.  A Semi-Supervised Bayesian Network Model for Microblog Topic Classification , 2012, COLING.

[15]  William E. Moen,et al.  Using Encyclopedic Knowledge for Automatic Topic Identification , 2009, CoNLL.

[16]  Ee-Peng Lim,et al.  Influentials, Novelty, and Social Contagion: The Viral Power of Average Friends, Close Communities, and Old News , 2012, Soc. Networks.

[17]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[18]  Mingsheng Long,et al.  Topic Correlation Analysis for Cross-Domain Text Classification , 2012, AAAI.

[19]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[20]  Frank M. Shipman,et al.  Mining user interest from search tasks and annotations , 2013, CIKM.

[21]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[22]  Fuji Ren,et al.  Predicting User-Topic Opinions in Twitter with Social and Topical Context , 2013, IEEE Transactions on Affective Computing.

[23]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  Rossano Schifanella,et al.  The role of information diffusion in the evolution of social networks , 2013, KDD.

[26]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[27]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[28]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[29]  R. Abelson Whatever Became of Consistency Theory? , 1983 .

[30]  Zhen Lin,et al.  Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification , 2014, Knowl. Based Syst..