论文信息 - Infinite-Label Learning with Semantic Output Codes

Infinite-Label Learning with Semantic Output Codes

We develop a new statistical machine learning paradigm, named infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set, which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels. The infinite-label learning fundamentally expands the scope of conventional multi-label learning, and better models the practical requirements in various real-world applications, such as image tagging, ads-query association, and article categorization. However, how can we learn a labeling function that is capable of assigning to a data point the labels omitted from the training set? To answer the question, we seek some clues from the recent work on zero-shot learning, where the key is to represent a class/label by a vector of semantic codes, as opposed to treating them as atomic labels. We validate the infinite-label learning by a PAC bound in theory and some empirical studies on both synthetic and real data.

[1] Bernt Schiele,et al. Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[3] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[5] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[6] James T. Kwok,et al. Efficient Multi-label Classification with Many Labels , 2013, ICML.

[7] Yongxin Yang,et al. A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[8] Alexander H. Waibel,et al. Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[9] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[10] Alberto Del Bimbo,et al. Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[11] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Grigorios Tsoumakas,et al. Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[13] Tom Heskes,et al. Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[14] Krishnakumar Balasubramanian,et al. The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[15] S. Ullman,et al. Generalization to Novel Images in Upright and Inverted Faces , 1993, Perception.

[16] David M. Pennock,et al. Categories and Subject Descriptors , 2001 .

[17] Philip H. S. Torr,et al. An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[18] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[19] Bernard De Baets,et al. Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method , 2016, ArXiv.

[20] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21] Luo Si,et al. Matrix co-factorization for recommendation with rich side information and implicit feedback , 2011, HetRec '11.

[22] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[23] Mubarak Shah,et al. Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[25] Ryan P. Adams,et al. Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes , 2010, UAI.

[26] Guillermo Sapiro,et al. Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information , 2012, SDM.

[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Johannes Fürnkranz,et al. Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning , 2015, ECML/PKDD.

[31] John Langford,et al. Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[32] Wei-Lun Chao,et al. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[33] Jeff G. Schneider,et al. Multi-Label Output Codes using Canonical Correlation Analysis , 2011, AISTATS.

[34] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35] Min-Ling Zhang,et al. A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.