VTeller: Telling the Values Somewhere, Sometime in a Dynamic Network of Urban Systems

Dynamic networks are very common in urban systems today. As data are acquired, unfortunately, they are rarely complete observations of the whole system. It is important to reliably infer the unobserved attribute values anywhere in the graphs, at certain times---either in the past or in the future. Previous work does not sufficiently capture the correlations inherent with graph topology and with time. We propose a machine learning approach using a novel probabilistic graphical model. We devise a series of algorithms to efficiently group the vertices, to learn the model parameters, and to infer the unobserved values for query processing. Furthermore, we propose a method to incrementally and automatically update the model. Finally, we perform an extensive experimental study using two real-world dynamic graph datasets to evaluate our approach.

[1]  Eli Upfal,et al.  The Case for Predictive Database Systems: Opportunities and Challenges , 2011, CIDR.

[2]  Stanislav Kolenikov,et al.  Spatiotemporal modeling of PM2.5 data with missing values , 2003 .

[3]  Jason W. Osborne,et al.  Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data , 2012 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[6]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[7]  Philip S. Yu,et al.  On dense pattern mining in graph streams , 2010, Proc. VLDB Endow..

[8]  Jie Wang,et al.  Event Pattern Matching over Graph Streams , 2014, Proc. VLDB Endow..

[9]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[12]  A. Deshpande GRAPHICAL MODELS FOR UNCERTAIN DATA , 2008 .

[13]  Tao Cheng,et al.  Non-parametric regression for space-time forecasting under missing data , 2012, Comput. Environ. Urban Syst..

[14]  J. Osborne,et al.  Six: Dealing with Missing or Incomplete Data: Debunking the Myth of Emptiness , 2013 .

[15]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[17]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[18]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[19]  Jianwei Wang,et al.  Travel Time Prediction , 2008 .

[20]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[21]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[22]  Shivnath Babu,et al.  Processing Forecasting Queries , 2007, VLDB.