Low-rank singular value thresholding for recovering missing air quality data

With the increasing awareness of the harmful impacts of urban air pollution, air quality monitoring stations have been deployed in many metropolitan areas. These stations provide air quality data to the public. However, due to sampling device failures and data processing errors, missing data in air quality measurements is common. Data integrity becomes a critical challenge when such data are employed for public services. In this paper, we investigate the mathematical property of air quality measurements, and attempt to recover the missing data. First, we empirically study the low rank property of these measurements. Second, we formulate the low rank matrix completion (LRMC) optimization problem to reconstruct the missing air quality data. The problem is transformed using duality theory, and singular value thresholding (SVT) is employed to develop sub-optimal solutions. Third, to evaluate the performance of our methodology, we conduct a series of case studies including different types of missing data patterns. The simulation results demonstrate that the proposed SVT methodology can effectively recover missing air quality data, and outperform the existing Interpolation. Finally, we investigate the parameter sensitivity of SVT. Our study can serve as a guideline for missing data recovery in the real world.

[1]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[2]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[3]  G. Strang Introduction to Linear Algebra , 1993 .

[4]  David R. Cox,et al.  The Oxford Dictionary of Statistical Terms , 2006 .

[5]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[6]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[7]  Gerhard J. Woeginger,et al.  Exact Algorithms for NP-Hard Problems: A Survey , 2001, Combinatorial Optimization.

[8]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[9]  Ce Zhu,et al.  Iterative block tensor singular value thresholding for extraction of lowrank component of image data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Keeley Crockett,et al.  Database systems: design, implementation & management - international edition , 2008 .

[11]  K. Pericleous,et al.  Modelling air quality in street canyons : a review , 2003 .

[12]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[13]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[14]  Scott G. Ghiocel,et al.  Missing Data Recovery by Exploiting Low-Dimensionality in Power System Synchrophasor Measurements , 2016, IEEE Transactions on Power Systems.

[15]  F.M. Cleveland,et al.  Cyber security issues for Advanced Metering Infrasttructure (AMI) , 2008, 2008 IEEE Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century.