Multivariate functional data modeling with time-varying clustering

We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM$_{10}$ individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say $n$ sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.

[1]  U. Grenander Stochastic processes and statistical inference , 1950 .

[2]  C. R. Rao,et al.  Some statistical methods for comparison of growth curves. , 1958 .

[3]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  J. Ramsay When the data are functions , 1982 .

[7]  M. West,et al.  Bayesian forecasting and dynamic models , 1989 .

[8]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[9]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[10]  H. Wackernagel Cokriging versus kriging in regionalized multivariate data analysis , 1994 .

[11]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[12]  J. Geweke,et al.  Measuring the pricing error of the arbitrage pricing theory , 1996 .

[13]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[14]  Michael A. West,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1997, J. Oper. Res. Soc..

[15]  M. West,et al.  Bayesian Dynamic Factor Models and Portfolio Allocation , 2000 .

[16]  Yasuo Amemiya,et al.  Latent Variable Analysis of Multivariate Spatial Data , 2002 .

[17]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[18]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[19]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[20]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[21]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[22]  C. Abraham,et al.  Unsupervised Curve Clustering using B‐Splines , 2003 .

[23]  Hao Zhang Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics , 2004 .

[24]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[25]  J. Hogan,et al.  Bayesian Factor Analysis for Spatially Correlated Data, With Application to Summarizing Area-Level Material Deprivation From Census Data , 2004 .

[26]  H. Müller,et al.  Dynamical Correlation for Multivariate Longitudinal Data , 2005 .

[27]  S. MacEachern,et al.  Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing , 2005 .

[28]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[29]  Martin Guha,et al.  Encyclopedia of Statistics in Behavioral Science , 2006 .

[30]  A. Gelfand,et al.  High-Resolution Space–Time Ozone Modeling for Assessing Trends , 2007, Journal of the American Statistical Association.

[31]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[32]  D. Cocchi,et al.  Hierarchical space-time modelling of PM10 pollution , 2007 .

[33]  Richard A. Davis,et al.  Continuous-time Gaussian autoregression , 2007 .

[34]  S. Konishi,et al.  Functional principal component analysis via regularized Gaussian basis expansions and its application to unbalanced data , 2007 .

[35]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  Lurdes Y. T. Inoue,et al.  Bayesian Hierarchical Curve Registration , 2008 .

[37]  Jianhua Z. Huang,et al.  Joint modelling of paired sparse functional data using principal components. , 2008, Biometrika.

[38]  A. Gelfand,et al.  Hybrid Dirichlet mixture models for functional data , 2009 .

[39]  Alan E Gelfand,et al.  A Spatio-Temporal Downscaler for Output From Numerical Models , 2010, Journal of agricultural, biological, and environmental statistics.

[40]  T. Gneiting,et al.  Matérn Cross-Covariance Functions for Multivariate Random Fields , 2010 .

[41]  Hans-Georg Müller Functional Data Analysis. , 2011 .

[42]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[43]  T. Choi,et al.  Gaussian Process Regression Analysis for Functional Data , 2011 .

[44]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[45]  Ying Sun,et al.  A Valid Matérn Class of Cross-Covariance Functions for Multivariate Random Fields With Any Number of Components , 2012 .

[46]  D. Gervini Warped functional regression , 2012, 1203.1975.

[47]  Caroline F Finch,et al.  Applications of functional data analysis: A systematic review , 2013, BMC Medical Research Methodology.

[48]  Julien Jacques,et al.  Model-based clustering for multivariate functional data , 2013, Comput. Stat. Data Anal..

[49]  Jeffrey S. Morris Functional Regression , 2014, 1406.4068.

[50]  C. Yau,et al.  A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models , 2013, 1301.2897.

[51]  Tao Chen,et al.  Gaussian process regression with multiple response variables , 2015 .

[52]  B. Shahbaba,et al.  Dependent Mat\'ern Processes for Multivariate Time Series , 2015, 1502.03466.

[53]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[54]  A. Skidmore,et al.  Retrieval of Specific Leaf Area From Landsat-8 Surface Reflectance Data Using Statistical and Physical Models , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[55]  Babak Shahbaba,et al.  Dependent Matérn Processes for Multivariate Time Series , 2017 .

[56]  E. Porcu,et al.  Modeling Daily Seasonality of Mexico City Ozone using Nonseparable Covariance Models on Circles Cross Time , 2018, 1807.05600.

[57]  R. Adhikari,et al.  Fast Bayesian inference of the multivariate Ornstein-Uhlenbeck process. , 2017, Physical review. E.

[58]  Eliane R. Rodrigues,et al.  Pollution State Modeling for Mexico City , 2018 .

[59]  Duncan Lee,et al.  Multivariate space‐time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty , 2017, Statistics in medicine.

[60]  John Kerekes,et al.  Band selection method for subpixel target detection using only the target reflectance signature. , 2019, Applied optics.

[61]  E. Porcu,et al.  Nonseparable covariance models on circles cross time: A study of Mexico City ozone , 2019, Environmetrics.

[62]  Eliane R. Rodrigues,et al.  Pollution state modelling for Mexico City , 2018, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[63]  Charles Bouveyron,et al.  Clustering multivariate functional data in group-specific functional subspaces , 2020, Computational Statistics.