Modeling Individual Cyclic Variation in Human Behavior

Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyHMMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets -- of human menstrual cycle symptoms and physical activity tracking data -- yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.

[1]  Bruce A.J. Ponder,et al.  Oral Contraceptives and the Risk of Hereditary Ovarian Cancer , 1998 .

[2]  Philip S. Yu,et al.  Infominer: mining surprising periodic patterns , 2001, KDD '01.

[3]  M. Blythe,et al.  Menstruation in Girls and Adolescents: Using the Menstrual Cycle as a Vital Sign , 2006, Pediatrics.

[4]  Derek R. Magee,et al.  Detecting lameness using 'Re-sampling Condensation' and 'multi-stream cyclic hidden Markov models' , 2002, Image Vis. Comput..

[5]  Julian Peto,et al.  Breast cancer and hormonal contraceptives: collaborative reanalysis of individual data on 53 297 women with breast cancer and 100 239 women without breast cancer from 54 epidemiological studies , 1996, The Lancet.

[6]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  L. Chiazze,et al.  The length and variability of the human menstrual cycle. , 1968, JAMA.

[9]  D Spiegelman,et al.  Menstrual cycle characteristics and history of ovulatory infertility in relation to breast cancer risk in a large cohort of US women. , 1998, American journal of epidemiology.

[10]  Walid G. Aref,et al.  On the Discovery of Weak Periodicities in Large Time Series , 2002, PKDD.

[11]  K. Yonkers,et al.  Pretreatment pattern of symptom expression in premenstrual dysphoric disorder. , 2005, Journal of affective disorders.

[12]  Zhen Su,et al.  Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation , 2010, Bioinform..

[13]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[14]  Raymond Greene,et al.  The Premenstrual Syndrome , 1953, British Journal of Psychiatry.

[15]  Céline Vetter,et al.  Social Jetlag and Obesity , 2012, Current Biology.

[16]  Jouko Lönnqvist,et al.  Seasonal affective disorder , 1998, The Lancet.

[17]  D. Mazmanian,et al.  To what extent do oral contraceptives influence mood and affect? , 2002, Journal of affective disorders.

[18]  William F. Ganong,et al.  Committee Opinion No. 651: Menstruation in Girls and Adolescents Using the Menstrual Cycle as a Vital Sign , 2015, Obstetrics and gynecology.

[19]  Zhang Zhang,et al.  Evaluation of Five Methods for Genome-Wide Circadian Gene Identification , 2014, Journal of biological rhythms.

[20]  J. Endicott,et al.  The menstrual cycle and mood disorders. , 1993, Journal of affective disorders.

[21]  Gilles Celeux,et al.  Selecting hidden Markov model state number with cross-validated likelihood , 2008, Comput. Stat..

[22]  Alvin H. Hansen Trends and Cycles in Economic Activity , 1957 .

[23]  Fabrizio Silvestri,et al.  Adaptive and resource-aware mining of frequent sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Philip S. Yu,et al.  Mining Asynchronous Periodic Patterns in Time Series Data , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[26]  Claus Svarer,et al.  Seasonal difference in brain serotonin transporter binding predicts symptom severity in patients with seasonal affective disorder. , 2016, Brain : a journal of neurology.

[27]  Karl Kornacker,et al.  JTK_CYCLE: An Efficient Nonparametric Algorithm for Detecting Rhythmic Components in Genome-Scale Data Sets , 2010, Journal of biological rhythms.

[28]  Chih-Fong Tsai,et al.  A novel approach for mining cyclically repeated patterns with multiple minimum supports , 2015, Appl. Soft Comput..

[29]  Julian Peto,et al.  Breast cancer and hormonal contraceptives: collaborative reanalysis of individual data on 53 297 women with breast cancer and 100 239 women without breast cancer from 54 epidemiological studies. Collaborative Group on Hormonal Factors in Breast Cancer , 1996 .

[30]  E. Lenton,et al.  Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase , 1984, British journal of obstetrics and gynaecology.

[31]  K E Paige,et al.  Effects of Oral Contraceptives on Affective Fluctuations Associated with the Menstrual Cycle , 1971, Psychosomatic medicine.

[32]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[33]  Jiawei Han,et al.  Mining event periodicity from incomplete observations , 2012, KDD.

[34]  R. Bracewell The Fourier Transform and Its Applications , 1966 .

[35]  T Hassan Pharmacologic considerations for patients taking oral contraceptives. , 1987, Connecticut dental student journal.

[36]  Masaru Kitsuregawa,et al.  Discovering Recurring Patterns in Time Series , 2015, EDBT.

[37]  E. Eriksson,et al.  Premenstrual syndrome , 2008, The Lancet.

[38]  S. Berga,et al.  Circadian pattern of plasma melatonin concentrations during four phases of the human menstrual cycle. , 1990, Neuroendocrinology.

[39]  George Karabatis,et al.  Discrete wavelet transform-based time series analysis and mining , 2011, CSUR.

[40]  Aristides Gionis,et al.  Discovering recurring activity in temporal networks , 2017, Data Mining and Knowledge Discovery.

[41]  T E Hewett,et al.  Association Between the Menstrual Cycle and Anterior Cruciate Ligament Injuries in Female Athletes , 1998, The American journal of sports medicine.

[42]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[43]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[44]  U. Lahaie,et al.  Seasonal Affective Disorder , 1986, British Journal of Psychiatry.

[45]  J. Leskovec,et al.  Large-scale physical activity data reveal worldwide activity inequality , 2017, Nature.

[46]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[47]  Charles E. Brown Coefficient of Variation , 1998 .

[48]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[49]  G.G. Cano,et al.  An approach to cardiac arrhythmia analysis using hidden Markov models , 1990, IEEE Transactions on Biomedical Engineering.

[50]  T H Monk,et al.  THE SLEEP OF HEALTHY PEOPLE—A DIARY STUDY , 2000, Chronobiology international.

[51]  Steven B. Haase,et al.  Design and analysis of large-scale biological rhythm studies: a comparison of algorithms for detecting periodic signals in biological data , 2013, Bioinform..

[52]  Frank D. Wood,et al.  Inference in Hidden Markov Models with Explicit State Duration Distributions , 2012, IEEE Signal Processing Letters.

[53]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[54]  Charles E. Brown Applied Multivariate Statistics in Geohydrology and Related Sciences , 1998 .

[55]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[56]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[57]  Till Roenneberg,et al.  Report Social Jetlag and Obesity , 2012 .

[58]  Jacob Schreiber,et al.  Pomegranate: fast and flexible probabilistic modeling in python , 2017, J. Mach. Learn. Res..

[59]  J. Calabrese,et al.  Rapid, continuous cycling and psychiatric co-morbidity in pediatric bipolar I disorder. , 2001, Bipolar disorders.

[60]  Ryen W. White,et al.  Harnessing the Web for Population-Scale Physiological Sensing: A Case Study of Sleep and Performance , 2017, WWW.

[61]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[62]  Jure Leskovec,et al.  Online Actions with Offline Impact: How Online Social Networks Influence Online and Offline User Behavior , 2016, WSDM.

[63]  J. Mitchison,et al.  The biology of the cell cycle , 1971 .

[64]  Fatos T. Yarman-Vural,et al.  A shape descriptor based on circular hidden Markov model , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[65]  Jiawei Han,et al.  ePeriodicity: Mining Event Periodicity from Incomplete Observations , 2015, IEEE Transactions on Knowledge and Data Engineering.

[66]  Manziba Akanda Nishi,et al.  An efficient approach to mine flexible periodic patterns in time series databases , 2015, Eng. Appl. Artif. Intell..

[67]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[68]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[69]  D. Dinges An overview of sleepiness and accidents , 1995, Journal of sleep research.

[70]  Y.-C. Zheng,et al.  Text-dependent speaker identification using circular hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[71]  Jure Leskovec,et al.  How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application , 2017, WWW.

[72]  Walid G. Aref,et al.  Incremental, online, and merge mining of partial periodic patterns in time-series databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[73]  D. Rubinfeld,et al.  Econometric models and economic forecasts , 2002 .

[74]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[75]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[76]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[77]  Philip S. Yu,et al.  Mining asynchronous periodic patterns in time series data , 2000, KDD '00.