Combining social-based data mining techniques to extract collective trends from twitter

Social Networks have become an important environment for Collective Trends extraction. The interactions amongst users provide information of their preferences and relationships. This information can be used to measure the influence of ideas, or opinions, and how they are spread within the Network. Currently, one of the most relevant and popular Social Networks is Twitter. This Social Network was created to share comments and opinions. The information provided by users is especially useful in different fields and research areas such as marketing. This data is presented as short text strings containing different ideas expressed by real people. With this representation, different Data Mining techniques (such as classification or clustering) will be used for knowledge extraction to distinguish the meaning of the opinions. Complex Network techniques are also helpful to discover influential actors and study the information propagation inside the Social Network. This work is focused on how clustering and classification techniques can be combined to extract collective knowledge from Twitter. In an initial phase, clustering techniques are applied to extract the main topics from the user opinions. Later, the collective knowledge extracted is used to relabel the dataset according to the clusters obtained to improve the classification results. Finally, these results are compared against a dataset which has been manually labelled by human experts to analyse the accuracy of the proposed method.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Daoqiang Zhang,et al.  Fuzzy clustering using kernel method , 2002 .

[3]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[4]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[5]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[6]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[7]  GeunSik Jo,et al.  Collaborative Information Filtering by Using Categorized Bookmarks on the Web , 2001, INAP.

[8]  Helena Ahonen-Myka Mining all maximal frequent word sequences in a set of sentences , 2005, CIKM '05.

[9]  Ram Gopal Raj,et al.  An application of case-based reasoning with machine learning for forensic autopsy , 2014, Expert Syst. Appl..

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Cheryl O Dubose The social media revolution. , 2011, Radiologic Technology.

[12]  Fausto Giunchiglia,et al.  Towards semantic social networks , 2015, 2015 Latin American Computing Conference (CLEI).

[13]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[14]  Ram Gopal Raj,et al.  A one-mode-for-all predictor for text messaging , 2011 .

[15]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Douglas M. Freimuth,et al.  Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures , 2009, ICCS.

[20]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[21]  Shintaro Okazaki,et al.  Extracting Collective Trends from Twitter Using Social-Based Data Mining , 2013, ICCCI.

[22]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[23]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[24]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  Jason J. Jung Cross-lingual query expansion in multilingual folksonomies: A case study on Flickr , 2013, Knowl. Based Syst..

[27]  Hui Xiong,et al.  Hyperclique pattern discovery , 2006, Data Mining and Knowledge Discovery.

[28]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[29]  K. Dawson,et al.  The Intersection of Online Social Networking with Medical Professionalism , 2008, Journal of General Internal Medicine.

[30]  Donald G. Bailey,et al.  An Efficient Euclidean Distance Transform , 2004, IWCIA.

[31]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[32]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[33]  Jason J. Jung Contextual synchronization for efficient social collaborations in enterprise computing: A case study on TweetPulse , 2013, Concurr. Eng. Res. Appl..

[34]  R. G. Raj,et al.  A Preliminary Investigation of User Perception and Behavioral Intention for Different Review Types: Customers and Designers Perspective , 2014, TheScientificWorldJournal.