Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data

Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification algorithms that leverage supervision from coarser resolutions to help train models on finer resolutions. The different resolutions are modeled as different views of the data in a multi-view framework that exploits the complementarity of features across different views to improve models on both views. Unlike traditional multi-view learning problems, the key challenge in our case is that there is no one-to-one correspondence between instances across different views in our case, which requires explicit modeling of the correspondence of instances across resolutions. We propose to use the features of instances at different resolutions to learn the correspondence between instances across resolutions using an attention mechanism.Experiments on the real-world application of mapping urban areas using satellite observations and sentiment classification on text data show the effectiveness of the proposed methods.

[1]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[2]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3]  Misha Denil,et al.  From Group to Individual Labels Using Deep Features , 2015, KDD.

[4]  Paolo Torroni,et al.  Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing , 2019, ArXiv.

[5]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[6]  Jun Yan,et al.  Sentence-level Sentiment Classification with Weak Supervision , 2017, SIGIR.

[7]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[8]  Tomás Lozano-Pérez,et al.  Image database retrieval with multiple-instance learning techniques , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  Marco Loog,et al.  On classification with bags, groups and sets , 2014, Pattern Recognit. Lett..

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[12]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[13]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[14]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[15]  Paolo Torroni,et al.  Attention in Natural Language Processing , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[19]  John C. Daucsavage,et al.  Land processes distributed active archive center product lifecycle plan , 2014 .

[20]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[21]  Anuj Karpatne,et al.  Spatio-Temporal Data Mining , 2017, ACM Comput. Surv..

[22]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..