Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems.

Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Comparisons of component size distributions for two complex systems---or a system with itself at two different time points---generally employ information-theoretic instruments, such as Jensen-Shannon divergence. We argue that these methods lack transparency and adjustability, and should not be applied when component probabilities are non-sensible or are problematic to estimate. Here, we introduce `allotaxonometry' along with `rank-turbulence divergence', a tunable instrument for comparing any two (Zipfian) ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.

[1]  B. Kendall Nonlinear Dynamics and Chaos , 2001 .

[2]  E. Kalko,et al.  Phenology of neotropical pepper plants (Piperaceae) and their association with their main dispersers, two short‐tailed fruit bats, Carollia perspicillata and C. castanea (Phyllostomidae) , 2004 .

[3]  Christopher M. Danforth,et al.  A 2-D numerical study of chaotic flow in a natural convection loop , 2010 .

[4]  James P. Bagrow,et al.  Zipf’s law holds for phrases, not words , 2014, Scientific Reports.

[5]  Eduardo G. Altmann,et al.  On the similarity of symbol frequency distributions with heavy tails , 2015, ArXiv.

[6]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[7]  C. Danforth,et al.  Defining the Boundaries of Normal Thrombin Generation: Investigations into Hemostasis , 2012, PloS one.

[8]  G. Leung,et al.  Leisure time physical activity and mortality in Hong Kong: case-control study of all adult deaths in 1998. , 2004, Annals of epidemiology.

[9]  C. Danforth,et al.  The impact of uncertainty in a blood coagulation model. , 2009, Mathematical medicine and biology : a journal of the IMA.

[10]  Elizabeth L. Sander,et al.  Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies , 2014 .

[11]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[12]  Carl T. Bergstrom,et al.  Why scatter plots suggest causality, and what we can do about it , 2018, ArXiv.

[13]  阿部 純義,et al.  Nonextensive statistical mechanics and its applications , 2001 .

[14]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[15]  Christopher M. Danforth,et al.  Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution , 2015, PloS one.

[16]  Christopher M. Danforth,et al.  AGGRESSIVE SHADOWING OF A LOW-DIMENSIONAL MODEL OF ATMOSPHERIC DYNAMICS , 2011, 1106.0084.

[17]  Martin Wattenberg Baby names, visualization, and social data analysis , 2005 .

[18]  Christopher M. Danforth,et al.  Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents , 2010, ArXiv.

[19]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[20]  A. Hedley,et al.  Effect of air pollution on daily mortality in Hong Kong. , 2001, Environmental health perspectives.

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[22]  Judit Bar-Ilan,et al.  Methods for comparing rankings of search engine results , 2005, Comput. Networks.

[23]  P. C. Standley The flora of Barro Colorado island, Panama / , 1933 .

[24]  Colin M. Van Oort,et al.  Simon's fundamental rich-get-richer model entails a dominant first-mover advantage. , 2016, Physical review. E.

[25]  R. Radner PROCEEDINGS of the FOURTH BERKELEY SYMPOSIUM ON MATHEMATICAL STATISTICS AND PROBABILITY , 2005 .

[26]  L. Jost Entropy and diversity , 2006 .

[27]  C. Danforth,et al.  Using Singular Value Decomposition to Parameterize State-Dependent Model Errors , 2008 .

[28]  Jun Yu,et al.  Complex dynamic behavior during transition in a solid combustion model , 2009, Complex..

[29]  S. Hubbell,et al.  Spatial patterns in the distribution of tropical tree species. , 2000, Science.

[30]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[31]  C. Danforth,et al.  Estimating and Correcting Global Weather Model Error , 2007 .

[32]  Ricard V. Solé,et al.  Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited* , 2001, J. Quant. Linguistics.

[33]  Hong Qiu,et al.  Air pollution and mortality: effect modification by personal characteristics and specific cause of death in a case-only study. , 2015, Environmental pollution.

[34]  John Baines,et al.  Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization , 2017, Proceedings of the National Academy of Sciences.

[35]  T Maillart,et al.  Empirical tests of Zipf's law mechanism in open source Linux distribution. , 2008, Physical review letters.

[36]  Jared M. Diamond,et al.  Guns, germs and steel : how the inequalities of wealth and power among modern peoples were moulded by prehistory , 1997 .

[37]  B J Cowling,et al.  Breast cancer incidence and mortality in a transitioning Chinese population: current and future trends , 2014, British Journal of Cancer.

[38]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[39]  C. Danforth,et al.  Dynamic structure of networks updated according to simple, local rules. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Christopher M. Danforth,et al.  Is language evolution grinding to a halt? The scaling of lexical turbulence in English fiction suggests it is not , 2015, J. Comput. Sci..

[41]  Peter Sheridan Dodds,et al.  Game story space of professional sports: Australian rules football. , 2015, Physical review. E.

[42]  Aaron Clauset,et al.  Scoring dynamics across professional team sports: tempo, balance and predictability , 2013, EPJ Data Science.

[43]  Jason S. Kessler,et al.  Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ , 2017, ACL.

[44]  W. Trelease The Piperaceae of Panama , 1927 .

[45]  Carolin Müller-Spitzer,et al.  Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size , 2019, Entropy.

[46]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.

[47]  Christopher M. Danforth,et al.  Predicting flow reversals in chaotic natural convection using data assimilation , 2011, 1108.5685.

[48]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[49]  Christopher M Danforth,et al.  Empirical correction of a toy climate model. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Christopher M. Danforth,et al.  The Lexicocalorimeter: Gauging public health through caloric input and output on social media , 2015, PloS one.

[51]  Yang Liu,et al.  Somewhere Over the Rainbow: An Empirical Assessment of Quantitative Colormaps , 2018, CHI.

[52]  C. Danforth,et al.  Chaotic natural convection in a toroidal thermosyphon with heat flux boundaries , 2013 .

[53]  C. Keylock Simpson diversity and the Shannon–Wiener index as special cases of a generalized entropy , 2005 .

[54]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[55]  Christopher M. Danforth,et al.  Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings , 2019, PloS one.

[56]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[57]  James P. Bagrow,et al.  Human language reveals a universal positivity bias , 2014, Proceedings of the National Academy of Sciences.

[59]  Thayer Alshaabi,et al.  Fame and Ultrafame: Measuring and comparing daily levels of 'being talked about' for United States' presidents, their rivals, God, countries, and K-pop , 2019, Journal of Quantitative Description Digital Media.

[60]  James P. Bagrow,et al.  Identifying missing dictionary entries with frequency-conserving context models , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Joshua S Weitz,et al.  Robust estimation of microbial diversity in theory and in practice , 2013, The ISME Journal.

[62]  S Redner,et al.  Safe leads and lead changes in competitive team sports. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[63]  Christopher M Danforth,et al.  Making forecasts for chaotic physical processes. , 2006, Physical review letters.

[64]  Matthew W. Hahn,et al.  Drift as a mechanism for cultural change: an example from baby names , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[65]  Christopher M. Danforth,et al.  Climate Change Sentiment on Twitter: An Unsolicited Public Opinion Poll , 2015, PloS one.

[66]  L. Jennings,et al.  Guns , 1976 .

[67]  Proceedings of the Royal Society (London) , 1906, Science.

[68]  E. Edwards. Communication theory. , 1967, Ergonomics.

[69]  Elena Deza,et al.  Dictionary of distances , 2006 .

[70]  E. Kalko,et al.  Hierarchical fruit selection by Neotropical leaf-nosed bats (Chiroptera: Phyllostomidae) , 2013 .

[71]  Nathaniel E. Helwig,et al.  An Introduction to Linear Algebra , 2006 .

[72]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[73]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[74]  Christopher M. Danforth,et al.  Accounting for Model Errors in Ensemble Data Assimilation , 2009 .

[75]  James P. Bagrow,et al.  Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[76]  David Borland,et al.  Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[77]  Anne E. Magurran,et al.  Biological Diversity: Frontiers in Measurement and Assessment , 2011 .

[78]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[79]  A. Rényi On Measures of Entropy and Information , 1961 .