Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change

Abstract This paper develops novel methods for using Yelp reviews as a window into the collective representations of a city and its neighbourhoods. Basing analysis on social media data such as Yelp is a challenging task because review data is highly sparse and direct analysis may fail to uncover hidden trends. To this end, we propose a deep autoencoder approach for embedding the language of neighbourhood-based business reviews into a reduced dimensional space that facilitates similarity comparison of neighbourhoods and their change over time. Our model improves performance in distinguishing real and fake neighbourhood descriptions derived from real reviews, increasing performance in the task from an average accuracy of 0.46 to 0.77. This improvement in performance indicates that this novel application of embedded language analysis permits us to uncover comparative trends in neighbourhood change through the lens of their venues' reviews, providing a computational methodology for reading a city through its neighbourhoods. The resulting toolkit makes it possible to examine a city's current sociological trends in terms of its neighbourhoods' collective identities.

[1]  E. Manley,et al.  A computational approach to ‘The Image of the City’ , 2019, Cities.

[2]  Marie Maclean,et al.  Introduction to the Paratext , 1991 .

[3]  N. Dempsey,et al.  Defining the neighbourhood: Challenges for empirical research , 2007 .

[4]  Haitao Liu,et al.  Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation , 2018, ArXiv.

[5]  R. Sampson,et al.  The Social Integration of American Cities: Network Measures of Connectedness Based on Everyday Mobility Across Neighborhoods , 2019, Sociological Methods & Research.

[6]  Brian D. Carpenter,et al.  Computer use among older adults in a naturally occurring retirement community , 2007, Comput. Hum. Behav..

[7]  Hana Wirth-Nesher,et al.  Impartial Maps: Reading and Writing Cities , 2001 .

[8]  Ramesh Raskar,et al.  Streetscore -- Predicting the Perceived Safety of One Million Streetscapes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Chun How Tan,et al.  Beyond "local", "categories" and "friends": clustering foursquare users with latent "topics" , 2012, UbiComp.

[10]  Danushka Bollegala,et al.  Learning Word Meta-Embeddings by Autoencoding , 2018, COLING.

[11]  D. Blei,et al.  Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding , 2013 .

[12]  Narayanan Kulathuramaiyer,et al.  An Empirical Study of Feature Selection for Text Categorization based on Term Weightage , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[13]  Roland Barthes,et al.  3. Semiology and the Urban , 1986 .

[14]  Vladimir Vargas-Calderón,et al.  Characterization of citizens using word2vec and latent topic analysis in a large set of tweets , 2019, ArXiv.

[15]  Paul Anthony Saporito THE CITY AS TEXT: THE POLITICS OF LANDSCAPE INTERPRETATION IN THE KANDYAN KINGDOM , 1992, Landscape Journal.

[16]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[17]  Noah A. Smith,et al.  Narrative framing of consumer sentiment in online restaurant reviews , 2014, First Monday.

[18]  Ernst Bonek,et al.  A Framework for Automatic Clustering of Parametric MIMO Channel Data Including Path Powers , 2006, IEEE Vehicular Technology Conference.

[19]  Daniel Jurafsky,et al.  Generating Recommendation Dialogs by Extracting Information from User Reviews , 2013, ACL.

[20]  Daniel Arribas-Bel,et al.  The sociocultural sources of urban buzz , 2016 .

[21]  Larry R. Ford Reading the Skylines of American Cities , 1992 .

[22]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[23]  Balázs Kovács,et al.  Authenticity and Consumer Value Ratings: Empirical Tests from the Restaurant Domain , 2014, Organ. Sci..

[24]  Amir Goldberg,et al.  What Does It Mean to Span Cultural Boundaries? Variety and Atypicality in Cultural Consumption , 2016 .

[25]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[26]  Elizabeth C. Delmelle,et al.  Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change , 2016 .

[27]  Christopher D. F. Rogers,et al.  Reading cities: Developing an urban diagnostics approach for identifying integrated urban problems with application to the city of Birmingham, UK , 2019, Cities.

[28]  Piotr A. Kowalski,et al.  Clustering using flower pollination algorithm and Calinski-Harabasz index , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[29]  Robert J. Sampson,et al.  Great American City: Chicago and the Enduring Neighborhood Effect , 2012 .

[30]  Daniele Quercia,et al.  The Social World of Twitter: Topics, Geography, and Emotions , 2012, ICWSM.

[31]  Michael Luca,et al.  Nowcasting the Local Economy: Using Yelp Data to Measure Economic Activity , 2017 .

[32]  B. Brown Midtown Atlanta: Privatized Planning in an Urban Neighborhood , 2001 .

[33]  M. K. Tiwari,et al.  Clustering Indian stock market data for portfolio management , 2010, Expert Syst. Appl..

[34]  Jackelyn Hwang The Social Construction of a Gentrifying Neighborhood , 2016 .

[35]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[36]  Cecilia Mascolo,et al.  Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks , 2011, The Social Mobile Web.

[37]  James B. D. Joshi,et al.  Exploring trajectory-driven local geographic topics in foursquare , 2012, UbiComp.

[38]  Lun Wu,et al.  Social sensing from street-level imagery: A case study in learning spatio-temporal urban mobility patterns , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[39]  Simone Vantini,et al.  Urbanscope: A Lens to Observe Language Mix in Cities , 2017 .

[40]  R. Pahl,et al.  The Social Construction of Communities , 1972 .

[41]  Daniel Silver,et al.  The place of art: Local area characteristics and arts growth in Canada, 2001–2011 , 2015 .

[42]  Daniel Fried,et al.  Analyzing the language of food on social media , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  J. Portugali Complexity, Cognition and the City , 2011 .

[45]  T. Edwin Chow,et al.  The Missing Parts from Social Media–Enabled Smart Cities: Who, Where, When, and What? , 2020, Annals of the American Association of Geographers.

[46]  Daniel Arribas-Bel,et al.  Use and validation of location-based services in urban research: An example with Dutch restaurants , 2018, Urban Studies.

[47]  Felix Kling,et al.  When a city tells a story: urban topic analysis , 2012, SIGSPATIAL/GIS.

[48]  Elizabeth C. Delmelle,et al.  Differentiating pathways of neighborhood change in 50 U.S. metropolitan areas , 2017 .

[49]  David S. Churchill "American Expatriates and the Building of Alternative Social Space in Toronto, 1965-1977" , 2010 .

[50]  John Betancur,et al.  Urban Neighborhoods in a New Era: Revitalization Politics in the Postindustrial City , 2015 .

[51]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[52]  Mirco Musolesi,et al.  You Are What You Eat (and Drink): Identifying Cultural Boundaries by Analyzing Food and Drink Habits in Foursquare , 2014, ICWSM.

[53]  Robert H. Baud,et al.  Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records , 2002, Int. J. Medical Informatics.

[54]  C. Coulton,et al.  Mapping Residents' Perceptions of Neighborhood Boundaries: A Methodological Note , 2001, American journal of community psychology.

[55]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[56]  Cecilia Mascolo,et al.  Geo-spotting: mining online location-based services for optimal retail store placement , 2013, KDD.

[57]  Tom Wolff,et al.  Computational Social Science and Sociology. , 2020, Annual review of sociology.

[58]  Joshua Fogel,et al.  Intentions to Use the Yelp Review Website and Purchase Behavior after Reading Reviews , 2017, J. Theor. Appl. Electron. Commer. Res..

[59]  C. A. Martins,et al.  Reducing the Dimensionality of Bag-of-Words Text Representation Used by Learning Algorithms , 2003 .

[60]  Markus Strohmaier,et al.  The nature and evolution of online food preferences , 2014, EPJ Data Science.

[61]  Sharon Zukin,et al.  The omnivore’s neighborhood? Online restaurant reviews, race, and gentrification , 2017 .

[62]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[63]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[64]  Kenneth E. Foote The City and the Sign: An Introduction to Urban Semiotics , 1986 .

[65]  Carlo Ratti,et al.  Predicting neighborhoods’ socioeconomic attributes using restaurant data , 2019, Proceedings of the National Academy of Sciences.

[66]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[67]  Paul A. Longley,et al.  The geography of Twitter topics in London , 2016, Comput. Environ. Urban Syst..

[68]  Paul A. Longley,et al.  Geo-temporal Twitter demographics , 2016, Int. J. Geogr. Inf. Sci..

[69]  James S. Duncan,et al.  The City as Text: The Politics of Landscape Interpretation in the Kandyan Kingdom , 1992 .

[70]  Michael Luca,et al.  Nowcasting Gentrification: Using Yelp Data to Quantify Neighborhood Change , 2018 .

[71]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[72]  Zachary F. Meisel,et al.  Yelp Reviews Of Hospital Care Can Supplement And Inform Traditional Surveys Of The Patient Experience Of Care. , 2016, Health affairs.

[73]  Brenda S. A. Yeoh,et al.  Theorizing The Southeast Asian City As Text: Urban Landscapes, Cultural Documents, And Interpretative Experiences , 2003 .