TweeProfiles: Detection of Spatio-temporal Patterns on Twitter

Online social networks present themselves as valuable information sources about their users and their respective behaviours and interests. Many researchers in data mining have analysed these types of data, aiming to find interesting patterns. This paper addresses the problem of identifying and displaying tweet profiles by analysing multiple types of data: spatial, temporal, social and content. The data mining process that extracts the patterns is composed by the manipulation of the dissimilarity matrices for each type of data, which are fed to a clustering algorithm to obtain the desired patterns. This paper studies appropriate distance functions for the different types of data, the normalization and combination methods available for different dimensions and the existing clustering algorithms. The visualization platform is designed for a dynamic and intuitive usage, aimed at revealing the extracted profiles in an understandable and interactive manner. In order to accomplish this, various visualization patterns were studied and widgets were chosen to better represent the information. The use of the project is illustrated with data from the Portuguese twittosphere.

[1]  Thierno M. O. Diallo,et al.  Structural Equation Modeling: A Multidisciplinary Journal , 2014 .

[2]  Matthew Lease,et al.  Finding and exploring memes in social media , 2012, HT '12.

[3]  D. Hendrick,et al.  Introduction , 1998, Thorax.

[4]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[5]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[6]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[7]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Roberto Frias,et al.  Twitter event detection: combining wavelet analysis and topic inference summarization , 2011 .

[9]  Qi Gao,et al.  Analyzing temporal dynamics in Twitter profiles for personalized recommendations in the social web , 2011, WebSci '11.

[10]  Chung-Hong Lee,et al.  Mining spatio-temporal information on microblogging streams using a density-based online clustering method , 2012, Expert Syst. Appl..

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Hamparsum Bozdogan,et al.  Statistical Data Mining and Knowledge Discovery , 2004 .

[13]  Alireza Rezaei Mahdiraji Clustering data stream: A survey of algorithms , 2009, Int. J. Knowl. Based Intell. Eng. Syst..

[14]  P. Kassomenos,et al.  Cluster analysis of five years of back trajectories arriving in Athens, Greece , 2010 .

[15]  Jaideep Srivastava,et al.  Mining temporal patterns in popularity of web items , 2011, Inf. Sci..

[16]  Mark Gahegan Visual exploration and explanation in Geography: Analysis with Light , 2009 .

[17]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[18]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[19]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Anthony Dekker,et al.  Conceptual Distance in Social Network Analysis , 2005, J. Soc. Struct..

[22]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[23]  Rosane Minghim,et al.  Visual text mining using association rules , 2007, Comput. Graph..

[24]  José Martins,et al.  TwitterEcho: a distributed focused crawler to support open research with twitter data , 2012, WWW.

[25]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[26]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[27]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[28]  Won Suk Lee,et al.  Statistical grid-based clustering over data streams , 2004, SGMD.

[29]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[30]  Ashish V. Tendulkar,et al.  Comparative study of clustering techniques for short text documents , 2011, WWW.

[31]  J. Malpica,et al.  Detecting plant spatial patterns, using multidimensional scaling and cluster analysis, in rural landscapes in Central Iberian Peninsula , 2010 .

[32]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[33]  John F. Roddick,et al.  A bibliography of temporal, spatial and spatio-temporal data mining research , 1999, SKDD.

[34]  Barbara Carminati,et al.  Network and profile based measures for user similarities on social networks , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[35]  Hsin-Chang Yang,et al.  A Novel Approach for Event Detection by Mining Spatio-temporal Information on Microblogs , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[36]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[37]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[38]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[39]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[40]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[41]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[42]  Axel Bruns,et al.  HOW LONG IS A TWEET? MAPPING DYNAMIC CONVERSATION NETWORKS ON TWITTER USING GAWK AND GEPHI , 2012 .

[43]  Victor J. Rayward-Smith,et al.  A New Metric for Categorical Data , 2003 .

[44]  Bernice A. Pescosolido,et al.  Handbook of the Sociology of Health, Illness, and Healing , 2011 .

[45]  Michela Bertolotto,et al.  Exploratory spatio-temporal data mining and visualization , 2007, J. Vis. Lang. Comput..

[46]  Markus Schaal,et al.  An Analysis of Topical Proximity in the Twitter Social Graph , 2012, SocInfo.

[47]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[48]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[49]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[50]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[51]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[52]  Andrea Shaw,et al.  The social network , 2019, The Great Firewall of China.