Automated classification of social network messages into Smart Cities dimensions

Abstract A Smart City can be defined as a high-tech city with several public and private services capable to strategically solve (or mitigate) problems normally generated by rapid urbanization. Different models of indicators have been developed to follow cities’ evolution to become a Smart City. An example of such model is the standard 37120 from the International Organization for Standardization (ISO) that proposes a set of dimensions and indicators (e.g. Transportation, Recreation, Solid Waste) for services and quality of life for sustainable cities and communities. It has been common to find official social network profiles of organizations and governmental entities related to the services they provide or are responsible for (water, waste, transportation, cultural events, etc.) and that are used by citizens as a gateway to directly interact and communicate their complains and problems about those services. The present paper proposes to apply machine learning algorithms over the urban data generated by social networks in order to create classifiers to automatically categorize citizens messages according to the different cities services dimensions. For that, two distinct text datasets in Portuguese were collected from two social networks: Twitter (1,950 tweets) and Colab.re (65,066 posts). The texts were mapped according to the different ISO 37120 categories, preprocessed and mined through the use of 8 algorithms implemented in Scikit-Learn. Initial results pointed out the feasibility of the proposal with models achieving average F1-measures around 55% for F1-macro and 78% for F1-micro when using Linear Vector Classification, Logistic Regression, Decision Tree and Complement Naive Bayes. However, as the datasets were highly unbalanced, the performances of the models vary significantly for each ISO category, with the best results occurring for Wastewater, Water & Sanitation, Energy and Transportation. The classifiers generated here can be integrated on a number of different city services and systems such as: governmental support decision systems, customer complain systems, communities dashboards, police offices, transportation’s companies, cultural producers, environmental agencies, and recyclers’ companies.

[1]  Ralf Tönjes,et al.  CityPulse: Large Scale Data Analytics Framework for Smart Cities , 2016, IEEE Access.

[2]  Dimosthenis Kyriazis,et al.  An integrated information lifecycle management framework for exploiting social network data to identify dynamic large crowd concentration events in smart cities applications , 2018, Future Gener. Comput. Syst..

[3]  Meg Holden,et al.  Sustainability indicator systems within urban governance: Usability analysis of sustainability indicator systems as boundary objects , 2013 .

[4]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[5]  Michael Grossniklaus,et al.  Editorial: Survey and Experimental Analysis of Event Detection Techniques for Twitter , 2016, Comput. J..

[6]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Ahmed M. Shahat Osman A novel big data analytics framework for smart cities , 2019, Future Gener. Comput. Syst..

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  송태민 Social Big Data 기반 보건의료 연구방법론 , 2013 .

[10]  Santanu Kumar Rath,et al.  Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor , 2015 .

[11]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[12]  Ljupco Kocarev,et al.  ISO-Standardized Smart City Platform Architecture and Dashboard , 2017, IEEE Pervasive Computing.

[13]  Bo Luo,et al.  Classification of Private Tweets Using Tweet Content , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).

[14]  Eleonora D'Andrea,et al.  Monitoring the public opinion about the vaccination topic from tweets analysis , 2019, Expert Syst. Appl..

[15]  Quan Z. Sheng,et al.  Recent research in computational intelligence paradigms into security and privacy for online social networks (OSNs) , 2018, Future Gener. Comput. Syst..

[16]  Jonice Oliveira,et al.  Subevents detection through topic modeling in social media posts , 2019, Future Gener. Comput. Syst..

[17]  Fabio Kon,et al.  Design and evaluation of a scalable smart city software platform with large-scale simulations , 2019, Future Gener. Comput. Syst..

[18]  Susanne Heuser,et al.  Location Based Social Networks – Definition, Current State of the Art and Research Agenda , 2013, Trans. GIS.

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  Marco Aurélio Gerosa,et al.  Software Platforms for Smart Cities , 2016, ACM Comput. Surv..

[21]  Han Tong Loh,et al.  Imbalanced text classification: A term weighting approach , 2009, Expert Syst. Appl..

[22]  Makarand Hastak,et al.  Social network analysis: Characteristics of online social networks after a disaster , 2018, Int. J. Inf. Manag..

[23]  Dimitrios Gunopulos,et al.  Intelligent Urban Data Monitoring for Smart Cities , 2016, ECML/PKDD.

[24]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[25]  Wojciech Cellary,et al.  Smart governance for smart industries , 2013, ICEGOV.

[26]  Yogesh Kumar Dwivedi,et al.  Smart cities: Advances in research - An information systems perspective , 2019, Int. J. Inf. Manag..

[27]  Namita Mittal,et al.  Text Classification Using Machine Learning Methods-A Survey , 2012, SocProS.

[28]  KimJooho,et al.  Social Network Analysis , 2018 .

[29]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[30]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[31]  Xiao Liu,et al.  A statistical approach to participant selection in location-based social networks for offline event marketing , 2019, Inf. Sci..

[32]  Pauli Miettinen,et al.  Machine Learning and Knowledge Discovery in Databases , 2016, Lecture Notes in Computer Science.

[33]  Diego López-de-Ipiña,et al.  Citizen-centric data services for smarter cities , 2017, Future Gener. Comput. Syst..

[34]  P. Bocquier WORLD URBANIZATION PROSPECTS: AN ALTERNATIVE TO THE UN MODEL OF PROJECTION COMPATIBLE WITH URBAN TRANSITION THEORY 1 , 2005 .

[35]  Jonathan G. Fiscus,et al.  Topic detection and tracking evaluation overview , 2002 .

[36]  Hai Anh Tran,et al.  A LSTM based framework for handling multiclass imbalance in DGA botnet detection , 2018, Neurocomputing.

[37]  Jason J. Jung Editorial: Recent Advances on Big Data Technologies and Applications , 2017, Mob. Networks Appl..

[38]  Wenyuan Liu,et al.  An adaptive point-of-interest recommendation method for location-based social networks based on user activity and spatial features , 2019, Knowl. Based Syst..

[39]  Erik Cambria,et al.  Semi-supervised learning for big social data analysis , 2018, Neurocomputing.

[40]  Dimitrios Milioris Topic Detection and Classification in Social Networks , 2018 .

[41]  Stuart E. Middleton,et al.  Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[42]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[43]  Fardin Ahmadizar,et al.  A novel multivariate filter method for feature selection in text classification problems , 2018, Eng. Appl. Artif. Intell..

[44]  ChengXiang Zhai,et al.  Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining , 2016 .

[45]  Piyushimita Thakuriah,et al.  Introduction to Seeing Cities Through Big Data: Research, Methods and Applications in Urban Informatics , 2017 .

[46]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[47]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[48]  Larissa R. Suzuki,et al.  Smart Cities IoT: Enablers and Technology Road Map , 2017 .

[49]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[50]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[51]  Julio César Hernández Castro,et al.  Detecting discussion communities on vaccination in twitter , 2017, Future Gener. Comput. Syst..

[52]  Poonam Sharma,et al.  Perspectives of Smart Cities: Introduction and Overview , 2017 .

[53]  Fernando Batista,et al.  MISNIS: An intelligent platform for twitter topic mining , 2017, Expert Syst. Appl..

[54]  Wenyong Wang,et al.  A new feature selection method based on a validity index of feature subset , 2017, Pattern Recognit. Lett..

[55]  João Paulo Papa,et al.  Internet of Things: A survey on machine learning-based intrusion detection approaches , 2019, Comput. Networks.

[56]  Xu Du,et al.  Mapping Ordinances and Tweets using Smart City Characteristics to Aid Opinion Mining , 2018, WWW.

[57]  Kevin Heaslip,et al.  Developing a Twitter-based traffic event detection model using deep learning architectures , 2019, Expert Syst. Appl..

[58]  Jianxin Li,et al.  HotML: A DSM-based machine learning system for social networks , 2017, J. Comput. Sci..

[59]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[60]  Rebeca P. Díaz Redondo,et al.  Discovering geo-dependent stories by combining density-based clustering and thread-based aggregation techniques , 2018, Expert Syst. Appl..

[61]  Rebeca P. Díaz Redondo,et al.  Sensing the city with Instagram: Clustering geolocated data for outlier detection , 2017, Expert Syst. Appl..

[62]  Rosalía Laza,et al.  Automatic parameter tuning for Evolutionary Algorithms using a Bayesian Case-Based Reasoning system , 2014, Appl. Soft Comput..

[63]  Michela Bertolotto,et al.  A gold-standard social media corpus for urban issues , 2017, SAC.

[64]  Kostas E. Psannis,et al.  Social networking data analysis tools & challenges , 2016, Future Gener. Comput. Syst..