Sequences of purchases in credit card data reveal lifestyles in urban populations

Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics, and social sciences. In human activities, Zipf's law describes, for example, the frequency of appearance of words in a text or the purchase types in shopping patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work, we define a framework using a text compression technique on the sequences of credit card purchases to detect ubiquitous patterns of collective behavior. Clustering the consumers by their similarity in purchase sequences, we detect five consumer groups. Remarkably, post checking, individuals in each group are also similar in their age, total expenditure, gender, and the diversity of their social and mobility networks extracted from their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, this method uncovers sets of significant sequences that reveal insights on collective human behavior.Digital traces of our lives have the potential to allow insights into collective behaviors. Here, the authors cluster consumers by their credit card purchase sequences and discover five distinct groups, within which individuals also share similar mobility and demographic attributes.

[1]  Dino Pedreschi,et al.  The retail market as a complex system , 2014, EPJ Data Science.

[2]  Alex Pentland,et al.  The predictability of consumer visitation patterns , 2010, Scientific Reports.

[3]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[4]  Zbigniew Smoreda,et al.  Using big data to study the link between human mobility and socio-economic development , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[7]  Jim Giles,et al.  Computational social science: Making the links , 2012, Nature.

[8]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[9]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Marc Barthelemy,et al.  Influence of sociodemographic characteristics on human mobility [corrected]. , 2015 .

[11]  Siddharth Gupta,et al.  The TimeGeo modeling framework for urban mobility without travel surveys , 2016, Proceedings of the National Academy of Sciences.

[12]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[13]  M. Barthelemy,et al.  From mobile phone data to the spatial structure of cities , 2014, Scientific Reports.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Christian Staudt,et al.  Engineering Parallel Algorithms for Community Detection in Massive Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[16]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[18]  Dietmar Bauer,et al.  Inferring land use from mobile phone activity , 2012, UrbComp '12.

[19]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[20]  Alessandro Vespignani Modelling dynamical processes in complex socio-technical systems , 2011, Nature Physics.

[21]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[22]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[23]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  J. Mervis U.S. science policy. Agencies rally to tackle big data. , 2012, Science.

[25]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[26]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[27]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[28]  Dino Pedreschi,et al.  Returners and explorers dichotomy in human mobility , 2015, Nature Communications.

[29]  Alex Pentland,et al.  The data-driven society. , 2013, Scientific American.

[30]  J. Burgon Making the links. , 2002, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[31]  A. Cavallo Scraped Data and Sticky Prices , 2015, Review of Economics and Statistics.

[32]  M. R. Solomon Consumer Behavior: Buying, Having, and Being , 1993 .

[33]  S. Kobrin,et al.  Community Careers in Crime , 1986, Crime and Justice.

[34]  A. Baselga The relationship between species replacement, dissimilarity derived from nestedness, and nestedness , 2012 .

[35]  Dietmar Plenz,et al.  powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.

[36]  Vincent D. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[37]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[38]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[39]  Xuelong Li,et al.  Refined-Graph Regularization-Based Nonnegative Matrix Factorization , 2017, ACM Trans. Intell. Syst. Technol..

[40]  N. Eagle,et al.  Network Diversity and Economic Development , 2010, Science.

[41]  Carlo Ratti,et al.  Cities through the Prism of People’s Spending Behavior , 2015, PloS one.

[42]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[43]  Xiaowen Dong,et al.  Social Bridges in Urban Purchase Behavior , 2017, ACM Trans. Intell. Syst. Technol..

[44]  Burcin Bozkaya,et al.  Money Walks: Implicit Mobility Behavior and Financial Well-Being , 2015, PloS one.

[45]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[46]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[47]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[48]  Marta C. González,et al.  Coupling human mobility and social ties , 2015, Journal of The Royal Society Interface.

[49]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[50]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[51]  Lars Schmidt-Thieme,et al.  Proceedings of the third ACM conference on Recommender systems , 2008, RecSys 2008.

[52]  Marta C. González,et al.  Understanding congested travel in urban areas , 2016, Nature Communications.

[53]  Josep Blat,et al.  Urban association rules: Uncovering linked trips for shopping behavior , 2016, ArXiv.

[54]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[55]  Munmun De Choudhury,et al.  Big data and the well-being of women and girls: applications on the social scientific frontier , 2017 .