Exploiting Categorization of Online News for Profiling City Areas

Profiling city areas, in terms of citizens’ behaviour and commercial and social activities, is an interesting issue in the context of smart cities, especially considering a real-time streaming context. Several methods have been proposed in the literature, exploiting different data sources. In this paper, we propose an approach to perform profiling of city areas based on articles of local online newspapers, by exploiting information regarding the text as well as metadata such as geo-localization and tags. In particular, we use tags associated with each article for identifying macro-categories through clustering analysis on tags embeddings. Further, we employ a text categorization model based on SVM to label online a new article, represented as Bag-of-Words, with one of such categories. The categorization approach has been integrated into a framework recently proposed by the authors for profiling city areas exploiting different web sources of data: the online newspapers are monitored continuously, thus producing a news stream to be analysed. We show experiments performed on the city of Rome, considering data from 2014 to 2018. We discuss the results obtained by adopting different classifiers and present that the best classifier, namely an SVM, can achieve an accuracy and an f1-score up to 93% and 79%, respectively.

[1]  Araceli Sanchis,et al.  Web news mining in an evolving framework , 2016, Inf. Fusion.

[2]  Athena Vakali,et al.  CityPulse: A Platform Prototype for Smart City Social Data Mining , 2016, Journal of the Knowledge Economy.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[7]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Cecilia Mascolo,et al.  Exploiting Foursquare and Cellular Data to Infer User Activity in Urban Environments , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[9]  Domenico Talia,et al.  What is this place? Inferring place categories through user patterns identification in geo-tagged tweets , 2014, 6th International Conference on Mobile Computing, Applications and Services.

[10]  Giuseppe Anastasi,et al.  Urban and social sensing for sustainable mobility in smart cities , 2013, 2013 Sustainable Internet and ICT for Sustainability (SustainIT).

[11]  Francesc Moreno-Noguer,et al.  BreakingNews: Article Annotation by Image and Text Processing , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Bhagya Nathali Silva,et al.  Towards sustainable smart cities: A review of trends, architectures, components, and open challenges in smart cities , 2018 .

[15]  Brent J. Hecht,et al.  VizByWiki: Mining Data Visualizations from the Web to Enrich News Articles , 2018, WWW.

[16]  Francesco Marcelloni,et al.  Detection of traffic congestion and incidents from GPS trace analysis , 2017, Expert Syst. Appl..

[17]  Guofeng Su,et al.  Using big data to enhance crisis response and disaster resilience for a smart city , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[18]  Eleonora D'Andrea,et al.  Monitoring the public opinion about the vaccination topic from tweets analysis , 2019, Expert Syst. Appl..

[19]  Denzil Ferreira,et al.  HotCity: enhancing ubiquitous maps with social context heatmaps , 2013, MUM.

[20]  Eleonora D'Andrea,et al.  Real-Time Detection of Traffic From Twitter Stream Analysis , 2015, IEEE Transactions on Intelligent Transportation Systems.

[21]  Jing Li,et al.  A personalized point-of-interest recommendation model via fusion of geo-social information , 2018, Neurocomputing.

[22]  Eleonora D'Andrea,et al.  Smart Profiling of City Areas Based on Web Data , 2018, 2018 IEEE International Conference on Smart Computing (SMARTCOMP).

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[24]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Fosca Giannotti,et al.  Mining mobility user profiles for car pooling , 2011, KDD.

[27]  Laura Po,et al.  Building an Urban Theft Map by Analyzing Newspaper Crime Reports , 2018, 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP).

[28]  Zellig S. Harris,et al.  Distributional Structure , 1954 .