Game-based crowdsourcing to support collaborative customization of the definition of sustainability

Abstract Successful adoption and management of sustainable urban systems hinges on the community embracing these systems. Capturing citizens’ ideas, views, and assessments of the built environment will be essential to this goal. In collaborative city planning, these are qualified and valued forms of partial knowledge that should be collectively used to shape the decision making process of urban planning. Among other tools, social media and online social network analytics can provide means to capture elements of such a distributed knowledge. While a structured definition of sustainability (normally dictated in a top-down fashion) may not sufficiently respond well to the pluralist nature of such knowledge acquisition; dealing with the unstructured community inputs, assessments and contributions on social media can be confusing. We can detect fully relevant topics/ideas in community discussions; but they typically suffer from lack of coherence. In this paper, we advocate the use of a semi-structured approach for capturing, analyzing, and interpreting citizens’ inputs. Public officials and professionals can develop the main elements (topical aspects) of sustainability, which can act as the skeleton of a taxonomy. It is however, the community inputs/ideas (in our case collected via social media and parsed), that can shape-up that skeleton and augment those topical aspects with adding the required semantic depth. In more specific terms, we collected tweets for four urban infrastructure mega-projects in North America. Then we used a game-with-a-purpose to crowdsource the identification of topics for a training set of tweets. This was then used to train machine learning algorithms to cluster the rest of collected tweets. We studied the semantic (finding the topics) of tweets as well as their sentiment (in terms of being opposing or supportive of a project). Our classification tested different decision trees with different topic hierarchies. We considered/extracted eight different linguistic features in studying contents of a tweet. Finally, we examined the accuracy of three algorithms in classifying tweets according to the sequence in the tree, and based on the extracted features. These are: K-nearest neighbors, Naive Bayes classifiers and Support Vector Machines (SVM). Respective to our data set, SVM outperformed other algorithms. Semantic analysis was insensitive to the depth/number of linguistic features considered. In contrast, sentiment analysis was enhanced when part of speech (PoS) was tracked. Interestingly, our work shows that considering the topic (semantic) of a tweet helped enhance the accuracy of sentiment analysis: including topical class as a feature in conducting sentiment analysis results in higher accuracies. This could be used as means to detect the evolution of community opinion: that topic-based social networks are evolving within the communities tweeting about urban projects. It could also be used to identify the topics of top priority to the community or the ones that have the widest spread of views. In our case, these were mainly the impacts of the design and engineering features on social issues.

[1]  Tamer E. El-Diraby,et al.  Sus-tweet-ability: Exposing public community's perspective on sustainability of urban infrastructure through online social media , 2016, Int. J. Hum. Comput. Stud..

[2]  Wen Shi,et al.  Multi-level tolerance opinion dynamics in military command and control networks , 2015 .

[3]  G. Atkinson,et al.  “Did You Feel It?” Intensity Data: A Surprisingly Good Measure of Earthquake Ground Motion , 2007 .

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Tamer E. El-Diraby,et al.  Domain Ontology for Construction Knowledge , 2013 .

[6]  Josef Küng,et al.  A Crowd-Sourcing Approach for Area-Wide On-Line Building Assessment Towards Earthquake Engineering , 2013, 2013 24th International Workshop on Database and Expert Systems Applications.

[7]  Aliza Sarlan,et al.  Twitter sentiment analysis , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Christian Heipke,et al.  Crowdsourcing geospatial data , 2010 .

[10]  James H. Lambert,et al.  Prioritizing Infrastructure Investments in Afghanistan with Multiagency Stakeholders and Deep Uncertainty of Emergent Conditions , 2012 .

[11]  Nitin Madnani,et al.  Getting started on natural language processing with Python , 2007, CROS.

[12]  Douglas Thain,et al.  Expert-Citizen Engineering: "Crowdsourcing" Skilled Citizens , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[13]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[14]  B. Crona,et al.  WHAT you know is WHO you know? Communication patterns among resource users as a prerequisite for co-management , 2006 .

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  T. Baker,et al.  Participatory indicator development: what can ecologists and local communities learn from each other? , 2008, Ecological applications : a publication of the Ecological Society of America.

[18]  Christopher S Lowry,et al.  CrowdHydrology: Crowdsourcing Hydrologic Data and Engaging Citizen Scientists , 2013, Ground water.

[19]  Tamer E. El-Diraby,et al.  Social Semantic Approach to Support Communication in AEC , 2012, J. Comput. Civ. Eng..

[20]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[21]  P. Torrens A Toolkit for Measuring Sprawl , 2008 .

[22]  Stefan Olander,et al.  Stakeholder impact analysis in construction project management , 2007 .

[23]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[24]  P. Levin,et al.  Using folk taxonomies to understand stakeholder perceptions for species conservation , 2011 .

[25]  R. Cowling,et al.  An operational model for mainstreaming ecosystem services for implementation , 2008, Proceedings of the National Academy of Sciences.

[26]  Tamer E. El-Diraby,et al.  Communities of Interest–Interest of Communities: Social and Semantic Analysis of Communities in Infrastructure Discussion Networks , 2016, Comput. Aided Civ. Infrastructure Eng..

[27]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[28]  Quan Pan,et al.  Learning Word Representations for Sentiment Analysis , 2017, Cognitive Computation.

[29]  Tamer E. El-Diraby,et al.  E-Society Portal: Integrating Urban Highway Construction Projects into the Knowledge City , 2005 .

[30]  Chen Yang,et al.  Impact of informal networks on opinion dynamics in hierarchically formal organization , 2015 .