Detecting geo-spatial weather clusters using dynamic heuristic subspaces

Few dataseis are as rich, complex, dynamic, near chaotic and close to real world physical phenomenon as weather data. To run weather predictions nationwide, it is pragmatic to identify groups of geographic locations that possess strikingly similar weather patterns. This task entails grouping a set of geo-spatial points into clusters based on a several dynamic atmospheric factors such as temperature, wind speed, precipitation, humidity etc. In this paper, we present a dynamic heuristic subspace-clustering algorithm that detects geo-spatial weather clusters across all zip codes in the US with greater accuracy than traditional clustering algorithms. Our method also incorporates a set of heuristics defined by human editors that detects one distinctive weather feature per cluster, which can be delivered to consumers as actionable weather information (e.g., `don't leave work without an umbrella'). We use the proposed algorithm to drastically scale a popular weather app called Poncho, which employs a mix of editorialized and automated mechanisms to personalize your weather forecast experience.

[1]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[2]  José M. F. Moura,et al.  Detection by Time Reversal: Single Antenna , 2007, IEEE Transactions on Signal Processing.

[3]  Cécile Amblard,et al.  Classification trees for time series , 2012, Pattern Recognit..

[4]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[5]  Myra Spiliopoulou,et al.  Online Clustering of High-Dimensional Trajectories under Concept Drift , 2011, ECML/PKDD.

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[7]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[8]  Roy George,et al.  Mining Weather Data Using Fuzzy Cluster Analysis , 2005 .

[9]  Chris H. Q. Ding,et al.  K-Subspace Clustering , 2009, ECML/PKDD.

[10]  W. Eric L. Grimson,et al.  Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models , 2011, International Journal of Computer Vision.

[11]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Yi Ma,et al.  Minimum effective dimension for mixtures of subspaces: a robust GPCA algorithm and its applications , 2004, CVPR 2004.

[13]  Peter J. Webster,et al.  Meteorology: Improve weather forecasts for the developing world , 2013, Nature.

[14]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[15]  Guriĭ Ivanovich Marchuk,et al.  Numerical Methods in Weather Prediction , 1974 .

[16]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[17]  Robert D. Nowak,et al.  K-subspaces with missing data , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[18]  Jiye Liang,et al.  The $K$-Means-Type Algorithms Versus Imbalanced Data Distributions , 2012, IEEE Transactions on Fuzzy Systems.

[19]  Kenji Yamanishi,et al.  Detecting changes of clustering structures using normalized maximum likelihood coding , 2012, KDD.

[20]  Henry Lin,et al.  Online Bipartite Perfect Matching With Augmentations , 2009, IEEE INFOCOM 2009.