Infinite-Label Learning with Semantic Output Codes

We develop a new statistical machine learning paradigm, named infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set, which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels. The infinite-label learning fundamentally expands the scope of conventional multi-label learning, and better models the practical requirements in various real-world applications, such as image tagging, ads-query association, and article categorization. However, how can we learn a labeling function that is capable of assigning to a data point the labels omitted from the training set? To answer the question, we seek some clues from the recent work on zero-shot learning, where the key is to represent a class/label by a vector of semantic codes, as opposed to treating them as atomic labels. We validate the infinite-label learning by a PAC bound in theory and some empirical studies on both synthetic and real data.

[1]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[3]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[5]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[6]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[7]  Yongxin Yang,et al.  A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[8]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[9]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[10]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[11]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[13]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[14]  Krishnakumar Balasubramanian,et al.  The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[15]  S. Ullman,et al.  Generalization to Novel Images in Upright and Inverted Faces , 1993, Perception.

[16]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[17]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[18]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[19]  Bernard De Baets,et al.  Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method , 2016, ArXiv.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Luo Si,et al.  Matrix co-factorization for recommendation with rich side information and implicit feedback , 2011, HetRec '11.

[22]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[23]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[25]  Ryan P. Adams,et al.  Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes , 2010, UAI.

[26]  Guillermo Sapiro,et al.  Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information , 2012, SDM.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Johannes Fürnkranz,et al.  Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning , 2015, ECML/PKDD.

[31]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[32]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[33]  Jeff G. Schneider,et al.  Multi-Label Output Codes using Canonical Correlation Analysis , 2011, AISTATS.

[34]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.