Discovering Periodic Patterns in Historical News

We address the problem of observing periodic changes in the behaviour of a large population, by analysing the daily contents of newspapers published in the United States and United Kingdom from 1836 to 1922. This is done by analysing the daily time series of the relative frequency of the 25K most frequent words for each country, resulting in the study of 50K time series for 31,755 days. Behaviours that are found to be strongly periodic include seasonal activities, such as hunting and harvesting. A strong connection with natural cycles is found, with a pronounced presence of fruits, vegetables, flowers and game. Periodicities dictated by religious or civil calendars are also detected and show a different wave-form than those provoked by weather. States that can be revealed include the presence of infectious disease, with clear annual peaks for fever, pneumonia and diarrhoea. Overall, 2% of the words are found to be strongly periodic, and the period most frequently found is 365 days. Comparisons between UK and US, and between modern and historical news, reveal how the fundamental cycles of life are shaped by the seasons, but also how this effect has been reduced in modern times.

[1]  Brian F. Schaffner,et al.  Re-Examining the Validity of Different Survey Modes for Measuring Public Opinion in the U.S.: Findings From a 2010 Multi-Mode Comparison , 2011 .

[2]  D. Parker,et al.  A new daily central England temperature series, 1772–1991 , 1992 .

[3]  L. Pelletier,et al.  Incidence of hospital admissions and severe outcomes during the first and second waves of pandemic (H1N1) 2009 , 2010, Canadian Medical Association Journal.

[4]  T M Allan,et al.  Composition of Seasonality of Disease , 1991, Scottish medical journal.

[5]  M. Planck Seasonality of Deaths in the U.S. by Age and Cause , 2002 .

[6]  Tim Weninger,et al.  Text Extraction from the Web via Text-to-Tag Ratio , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[7]  N. Grassly,et al.  Seasonal infectious disease epidemiology , 2006, Proceedings of the Royal Society B: Biological Sciences.

[8]  Francis C. M. Lau,et al.  A network perspective of the stock market , 2010 .

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  J. Ayers,et al.  Seasonality in seeking mental health information on Google. , 2013, American journal of preventive medicine.

[11]  James Adams,et al.  Causes and Electoral Consequences of Party Policy Shifts in Multiparty Elections: Theoretical Results and Empirical Evidence , 2012 .

[12]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[13]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[14]  Alan R. Jones,et al.  Fast Fourier Transform , 1970, SIGP.

[15]  Randy J. Nelson,et al.  Seasonal Patterns of Stress, Immune Function, and Disease , 2002 .

[16]  P. Jones,et al.  Updated precipitation series for the U.K. and discussion of recent extremes , 2000 .

[17]  Mark Dredze,et al.  Could behavioral medicine lead the web data revolution? , 2014, JAMA.

[18]  Erik Brynjolfsson,et al.  Goodbye Pareto Principle, Hello Long Tail: The Effect of Search Costs on the Concentration of Product Sales , 2011, Manag. Sci..

[19]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  G. Rubin,et al.  Health Protection Agency, , 2011 .

[22]  Nello Cristianini,et al.  Seasonal Fluctuations in Collective Mood Revealed by Wikipedia Searches and Twitter Posts , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[23]  J. Ayers,et al.  Circaseptan (weekly) rhythms in smoking cessation considerations. , 2014, JAMA internal medicine.

[24]  Laura Spinney Human cycles: History as science , 2012, Nature.

[25]  Naren Ramakrishnan,et al.  Gaining Insights into Epidemics by Mining Historical Newspapers , 2013, Computer.

[26]  Hsin-I Wu,et al.  A model comparison for daylength as a function of latitude and day of year , 1995 .

[27]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[28]  Nello Cristianini,et al.  Effects of the recession on public mood in the UK , 2012, WWW.

[29]  J. A. Secord,et al.  Quick and Magical Shaper of Science , 2002, Science.

[30]  Nello Cristianini,et al.  Change-Point Analysis of the Public Mood in UK Twitter during the Brexit Referendum , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[31]  Nello Cristianini,et al.  NOAM: news outlets analysis and monitoring system , 2011, SIGMOD '11.

[32]  Stuart W. Smith,et al.  A permutation test for periodicities in short, noisy time series , 1975, Annals of Biomedical Engineering.