Discovering Spatiotemporal Patterns of Urban Life from Mobile Data: an Exploration through Hierarchical Independent Component Analysis.
暂无分享,去创建一个
We analyze a spatiotemporal dataset describing the use over time of the mobile-phone network in the urban area of Milan, Italy. This dataset contains 13.8 million records referenced in space over a regular lattice of 10573 pixels at a spatial resolution of nearly 250 m covering an area of 757 km^2, and referenced in time over a regular grid of 1308 intervals at a temporal resolution of 15 minutes covering a period of two weeks. The database has been provided by Telecom Italia Mobile.
Aim of the analysis is the identification of spatiotemporal patterns characterizing specific locations and/or specific periods possibly associated to different activities taking place within the city. This research is part of the Green Move Project, a research project held at Politecnico di Milano and funded by Regione Lombardia investigating the potential of a third-generation car-sharing system within the city of Milan.
We analyze these data by means of a newly developed methodology: Hierarchical Independent Component Analysis (HICA). HICA is based on a recursive hierarchical application of Independent Component Analysis (ICA) on pairs of variables. The final output of HICA is a multi-resolution, wavelet-inspired, and data-driven basis useful to represent data and to investigate their sources of variability. Differently from ICA and similarly to wavelets, the basis provided by HICA is naturally ordered according to the dimension of each basis element support. Moreover, not just one basis is provided but an entire family of bases is available that is characterized by an increasing level of resolution. The possibility of choosing different levels of resolution is critical from a practical point of view, because it allows the investigator to impose a different grade of sparsity to the basis. Similarly to ICA, the basis provided by HICA, is not orthogonal and driven by the search for independent components. Thus differently from all principal-component inspired methods no purely mathematical (and possibly unrealistic) constraint is imposed to the final basis.
We proved two theorems in support of this new methodology. The first one is a consistency result of the algorithm. Indeed we proved that if just K independent components are present acting on different sets of variables and the noise variability is below a certain threshold, HICA detects those K components. The second result characterizes, in the same framework, the behavior of the fraction of explained total variance as the level resolution and the dimension of the optimal basis is varying. This result provides the investigator with practical guidelines about the choice of the more suited level of resolution and the more proper dimension of the final basis.
Data have been analyzed in a double perspective: (i) sites are assumed to index variables while instants of times to index instances (this approach is common in the Blind Source Separation literature), and (ii) instants of times are assumed to index variables while sites to index instances (this approach is common in the Geo-statistical literature). The declination of HICA in the two approaches allows us to impose spatial sparsity or temporal sparsity, respectively, to the final representation. Both approaches unveil interesting patterns interpretable in terms of working, residential, shopping, leisure, and commuting activities.