Discovering the core semantics of event from social media

As social media is opening up such as Twitter and Sina Weibo,11Chinese microblogging website http://weibo.com/. large volumes of short texts are flooding on the Web. The ocean of short texts dilutes the limited core semantics of event in cyberspace by redundancy, noises and irrelevant content on the web, which make it difficult to discover the core semantics of event. The major challenges include how to efficiently learn the semantic association distribution by small-scale association relations and how to maximize the coverage of the semantic association distribution by the minimum number of redundancy-free short texts. To solve the above issues, we explore a Markov random field based method for discovering the core semantics of event. This method makes semantics collaborative computation for learning association relation distribution and makes information gradient computation for discovering k redundancy-free texts as the core semantics of event. We evaluate our method by comparing with two state-of-the-art methods on the TAC dataset and the microblog dataset. The results show our method outperforms other methods in extracting core semantics accurately and efficiently. The proposed method can be applied to short text automatic generation, event discovery and summarization for big data analysis. Proposing a Markov random field based method for discovering the core semantics of event.Learning the association relation distribution of event by small scale association relations.Maximizing the coverage of association relation distribution by the minimum number of short texts.

[1]  Hai Zhuge,et al.  Interactive semantics , 2010, Artif. Intell..

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Yan Zhang,et al.  Timeline Generation through Evolutionary Trans-Temporal Summarization , 2011, EMNLP.

[4]  Liangyu Chen,et al.  An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization , 2015, AAAI.

[5]  Jacob Feldman,et al.  Minimization of Boolean complexity in human concept learning , 2000, Nature.

[6]  Daniel Marcu,et al.  The rhetorical parsing, summarization, and generation of natural language texts , 1998 .

[7]  Heikki Mannila,et al.  Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets , 2000, UAI.

[8]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[9]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[10]  F. Hayes-Roth,et al.  Concept learning and the recognition and classification of exemplars , 1977 .

[11]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[12]  Rada Mihalcea,et al.  Using the Essence of Texts to Improve Document Classification , 2005 .

[13]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[14]  Hai Zhuge,et al.  Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Zhenhua Wang,et al.  Sumblr: continuous summarization of evolving tweet streams , 2013, SIGIR.

[16]  Bei Xu,et al.  Automatic faceted navigation , 2014, Future Gener. Comput. Syst..

[17]  Hila Becker,et al.  Selecting Quality Twitter Content for Events , 2011, ICWSM.

[18]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[19]  Wenjie Li,et al.  Combining co-clustering with noise detection for theme-based summarization , 2013, TSLP.

[20]  J. Feldman The Simplicity Principle in Human Concept Learning , 2003 .

[21]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[23]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[24]  J. D. Smith,et al.  Comparing prototype-based and exemplar-based accounts of category learning and attentional allocation. , 2002, Journal of experimental psychology. Learning, memory, and cognition.

[25]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[26]  Heikki Mannila,et al.  Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data , 2003, IEEE Trans. Knowl. Data Eng..

[27]  Hai Zhuge,et al.  Semantic linking through spaces for cyber-physical-socio intelligence: A methodology , 2011, Artif. Intell..

[28]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.

[29]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[30]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[31]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[32]  Yiyu Yao,et al.  Interpreting Concept Learning in Cognitive Informatics and Granular Computing , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Albert Y. Zomaya,et al.  A survey on text mining in social networks , 2015, The Knowledge Engineering Review.

[34]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[35]  Yiannis Kompatsiaris,et al.  Two-level Message Clustering for Topic Detection in Twitter , 2014, SNOW-DC@WWW.

[36]  Abdolreza Abhari,et al.  Cluster-discovery of Twitter messages for event detection and trending , 2015, J. Comput. Sci..

[37]  J. Feldman An algebra of human concept learning , 2006 .

[38]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[39]  Dilek Z. Hakkani-Tür,et al.  Discovery of Topically Coherent Sentences for Extractive Summarization , 2011, ACL.

[40]  Jun Zhang,et al.  Power Series Representation Model of Text Knowledge Based on Human Concept Learning , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[41]  J. Feldman A catalog of Boolean concepts , 2003 .

[42]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[43]  Hai Zhuge,et al.  Summarization of scientific documents by detecting common facts in citations , 2014, Future Gener. Comput. Syst..

[44]  Hai Zhuge Cyber-Physical Society - The science and engineering for future society , 2014, Future Gener. Comput. Syst..

[45]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[46]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[47]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[48]  S. Tabassum,et al.  A review of recent progress in multi document summarization , 2015 .

[49]  D. Rundus Analysis of rehearsal processes in free recall. , 1971 .

[50]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[51]  Michael I. Jordan Graphical Models , 2003 .