Feedback Coding for Active Learning

The iterative selection of examples for labeling in active machine learning is conceptually similar to feedback channel coding in information theory: in both tasks, the objective is to seek a minimal sequence of actions to encode information in the presence of noise. While this high-level overlap has been previously noted, there remain open questions on how to best formulate active learning as a communications system to leverage existing analysis and algorithms in feedback coding. In this work, we formally identify and leverage the structural commonalities between the two problems, including the characterization of encoder and noisy channel components, to design a new algorithm. Specifically, we develop an optimal transport-based feedback coding scheme called Approximate Posterior Matching (APM) for the task of active example selection and explore its application to Bayesian logistic regression, a popular model in active learning. We evaluate APM on a variety of datasets and demonstrate learning performance comparable to existing active learning methods, at a reduced computational cost. These results demonstrate the potential of directly deploying concepts from feedback channel coding to design efficient active learning strategies.

[1]  J. Wellner,et al.  Log-Concavity and Strong Log-Concavity: a review. , 2014, Statistics surveys.

[2]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[3]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[4]  Abdullah Akce,et al.  A Feedback Information-Theoretic Approach to the Design of Brain–Computer Interfaces , 2010, Int. J. Hum. Comput. Interact..

[5]  C. Villani Optimal Transport: Old and New , 2008 .

[6]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[7]  Todd P. Coleman,et al.  An Information and Control Framework for Optimizing User-Compliant Human–Computer Interfaces , 2017, Proceedings of the IEEE.

[8]  Marco Loog,et al.  A benchmark and comparison of active learning for logistic regression , 2016, Pattern Recognit..

[9]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[12]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[13]  Abdullah Akce,et al.  Remote teleoperation of an unmanned aircraft with a brain-machine interface: Theory and preliminary results , 2010, 2010 IEEE International Conference on Robotics and Automation.

[14]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[15]  Andreas Krause,et al.  Sequential Information Maximization: When is Greedy Near-optimal? , 2015, COLT.

[16]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[17]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[19]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[20]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[21]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[22]  José Miguel Hernández-Lobato,et al.  Bayesian Batch Active Learning as Sparse Subset Approximation , 2019, NeurIPS.

[23]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[26]  Meir Feder,et al.  Optimal Feedback Communication Via Posterior Matching , 2009, IEEE Transactions on Information Theory.

[27]  Walter T. Federer,et al.  Sequential Design of Experiments , 1967 .

[28]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[29]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[30]  Upamanyu Madhow,et al.  On the limits of communication with low-precision analog-to-digital conversion at the receiver , 2009, IEEE Transactions on Communications.

[31]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[32]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[33]  Quentin Mérigot,et al.  A Multiscale Approach to Optimal Transport , 2011, Comput. Graph. Forum.

[34]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[35]  Tom Rainforth,et al.  On Statistical Bias In Active Learning: How and When To Fix It , 2021, ICLR.

[36]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[37]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[38]  Pranjal Awasthi,et al.  Efficient active learning of sparse halfspaces with arbitrary bounded noise , 2020, NeurIPS.

[39]  Tara Javidi,et al.  Bayesian Active Learning With Non-Persistent Noise , 2015, IEEE Transactions on Information Theory.

[40]  Xiaojin Zhu,et al.  Teacher Improves Learning by Selecting a Training Subset , 2018, AISTATS.

[41]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[42]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[43]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Andreas Winkelbauer,et al.  Moments and Absolute Moments of the Normal Distribution , 2012, ArXiv.

[45]  Emmanuel J. Candès,et al.  On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[46]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[47]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[48]  Rui Ma,et al.  Generalizing the Posterior Matching Scheme to higher dimensions via optimal transportation , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[49]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.