Topic modeling to discover the thematic structure and spatial-temporal patterns of building renovation and adaptive reuse in cities

Abstract Building alteration and redevelopment play a central role in the revitalization of developed cities, where the scarcity of available land limits the construction of new buildings. The adaptive reuse of existing space reflects the underlying socioeconomic dynamics of the city and can be a leading indicator of economic growth and diversification. However, the collective understanding of building alteration patterns is constrained by significant barriers to data accessibility and analysis. We present a data mining and knowledge discovery process for extracting, analyzing, and integrating building permit data for more than 2,500,000 alteration projects from seven major U.S. cities. We utilize natural language processing and topic modeling to discover the thematic structure of construction activities from permit descriptions and merge with other urban data to explore the dynamics of urban change. The knowledge discovery process proceeds in three steps: (1) text mining to identify popular words, popularity change, and their co-appearance likelihood; (2) topic modeling using latent Dirichlet allocation (LDA); and (3) integrating the topic modeling output with building information and ancillary data to discover the spatial, temporal, and thematic patterns of urban redevelopment and regeneration. The results demonstrate a generalizable approach that can be used to analyze unstructured text data extracted from permit records across varying database structures, permit typologies, and local contexts. Our machine learning methodology can assist cities to better monitor building alteration activity, analyze spatiotemporal patterns of redevelopment, and more fully understand the economic, social, and environmental implications of changes to the urban built environment.

[1]  Julia Lane Building an Infrastructure to Support the Use of Government Administrative Data for Program Performance and Social Science Research , 2018 .

[2]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[3]  R. Howarth Sources for a history of the ternary diagram , 1996, The British Journal for the History of Science.

[4]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[5]  Dale Neef,et al.  Digital Exhaust: What Everyone Should Know About Big Data, Digitization and Digitally Driven Innovation , 2014 .

[6]  Subhankar Dhar,et al.  A cloud-based framework for smart permit system for buildings , 2016, 2016 IEEE International Smart Cities Conference (ISC2).

[7]  Isabel M. Martins,et al.  Construction and demolition waste generation and management in Lisbon (Portugal) , 2011 .

[8]  Ramesh Raskar,et al.  Do People Shape Cities, or Do Cities Shape People? The Co-Evolution of Physical, Social, and Economic Change in Five Major U.S. Cities , 2015 .

[9]  Constantine E. Kontokosta Energy disclosure, market behavior, and the building data ecosystem , 2013, Annals of the New York Academy of Sciences.

[10]  Dori B. Reissman,et al.  Is home renovation or repair a risk factor for exposure to lead among children residing in New York City? , 2002, Journal of Urban Health.

[11]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[12]  Susan L. Cutter,et al.  Using Building Permits to Monitor Disaster Recovery: A Spatio-Temporal Case Study of Coastal Mississippi Following Hurricane Katrina , 2010 .

[13]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[14]  R. Guha,et al.  What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System , 2013, Cartography and geographic information science.

[15]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[16]  D. Aldous Exchangeability and related topics , 1985 .

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  Constantine E. Kontokosta Urban Informatics in the Science and Practice of Planning , 2018, Journal of Planning Education and Research.

[19]  Luís M A Bettencourt,et al.  The Uses of Big Data in Cities , 2014, Big Data.

[20]  Håvard Bergsdal,et al.  Towards modelling of construction, renovation and demolition activities: Norway's dwelling stock, 1900–2100 , 2008 .

[21]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[22]  David R J Fikis,et al.  Automated Text Data Mining Analysis of Five Decades of Educational Leadership Research Literature , 2017 .

[23]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[24]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[25]  Bamshad Mobasher,et al.  Data Mining for Web Personalization , 2007, The Adaptive Web.

[26]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[29]  Michael Elhadad,et al.  Redundancy-Aware Topic Modeling for Patient Record Notes , 2014, PloS one.

[30]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[31]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[33]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[34]  Weizhong Zhao,et al.  A heuristic approach to determine an appropriate number of topics in topic modeling , 2015, BMC Bioinformatics.

[35]  D. Blei,et al.  Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding , 2013 .

[36]  Vivek Kumar Rangarajan Sridhar,et al.  Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words , 2015, VS@HLT-NAACL.

[37]  Mario Fontana,et al.  Energy retrofit of a single-family house: Life cycle net energy saving and environmental benefits , 2013 .

[38]  A. Helms Understanding gentrification: an empirical analysis of the determinants of urban housing renovation , 2003 .

[39]  Alain Dubois,et al.  Digital Construction Permit: A Round Trip Between GIS and IFC , 2018, EG-ICE.

[40]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[41]  Michael T. Owyang,et al.  Clustered Housing Cycles , 2013 .

[42]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[43]  Subhankar Dhar,et al.  A building permit system for smart cities: A cloud-based framework , 2018, Comput. Environ. Urban Syst..

[44]  Martin Ester,et al.  Spatial topic modeling in online social media for location recommendation , 2013, RecSys.

[45]  T. Harford,et al.  Big data: A big mistake? , 2014 .

[46]  Lise Schrøder,et al.  Mature e-Government based on spatial data - legal implications , 2014, Int. J. Spatial Data Infrastructures Res..

[47]  José Ramón Gil-García,et al.  Ready for data analytics?: data collection and creation in local governments , 2018, DG.O.

[48]  L. Lees Super-gentrification: The Case of Brooklyn Heights, New York City , 2003 .

[49]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[50]  M. Go The Power of Participation , 2014 .

[51]  Nadine Schuurman,et al.  Area-Based Topic Modeling and Visualization of Social Media for Qualitative GIS , 2017 .

[52]  Constantine E. Kontokosta Modeling the energy retrofit decision in commercial office buildings , 2016 .

[53]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[54]  P. Willen,et al.  A Profile of the Mortgage Crisis in a Low-and-Moderate-Income Community , 2010 .

[55]  Peng Gao,et al.  A hybrid decision support system for sustainable office building renovation and energy performance improvement , 2010 .

[56]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[57]  Ramasamy Uthurusamy,et al.  Evolving data into mining solutions for insights , 2002, CACM.

[58]  Alexandra Paio,et al.  Measuring Urban Renewal: A Dual Kernel Density Estimation to Assess the Intensity of Building Renovation—Case Study in Lisbon , 2018, Urban Science.

[59]  Hugh Glaser,et al.  Linked Open Government Data: Lessons from Data.gov.uk , 2012, IEEE Intelligent Systems.

[60]  Eric P. Xing,et al.  A Nonparametric Mixture Model for Topic Modeling over Time , 2012, SDM.