Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

Observational data are often accompanied by natural structural indices, such as time stamps or geographic locations, which are meaningful to prediction tasks but are often discarded. We leverage semantically meaningful indexing data while ensuring robustness to potentially uninformative or misleading indices. We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction. Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks, with no need to retrain models. Our theoretical analysis details simple conditions under which post-estimation smoothing will improve accuracy over that of the original predictor. Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice. Together, these results illuminate a novel way to consider and incorporate the natural structure of index variables in machine learning.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[3]  E. Nadaraya On Estimating Regression , 1964 .

[4]  Colin Raffel,et al.  Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Mikhail Belkin,et al.  Does data interpolation contradict statistical optimality? , 2018, AISTATS.

[7]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[8]  Olga Sorkine-Hornung,et al.  Reconstruction of Articulated Objects from a Moving Camera , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[11]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[12]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jitendra Malik,et al.  Predicting 3D Human Dynamics From Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[17]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[18]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[19]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[20]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[21]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[22]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[23]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[24]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[25]  Abhishek Sharma,et al.  Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[26]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[27]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[29]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[30]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[32]  Alan E. Gelfand,et al.  Predicting Spatial Patterns of House Prices Using LPR and Bayesian Smoothing , 2002 .

[33]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  R. Dubin,et al.  Predicting House Prices Using Multiple Listings Data , 1998 .

[36]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[37]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[38]  C. Deutsch,et al.  Statistical approach to inverse distance interpolation , 2009 .

[39]  Aysegul Can Specification and estimation of hedonic housing price models , 1992 .

[40]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[41]  Yann LeCun,et al.  Machine Learning and the Spatial Structure of House Prices and Housing Returns , 2008 .

[42]  David W. S. Wong,et al.  An adaptive inverse-distance weighting spatial interpolation technique , 2008, Comput. Geosci..

[43]  H. Schneeweiß,et al.  Consistent estimation of a regression with errors in the variables , 1976 .

[44]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[45]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.