Incorporating Diversity and Informativeness in Multiple-Instance Active Learning

Multiple-instance active learning (MIAL) is a paradigm to collect sufficient training bags for a multiple-instance learning (MIL) problem, by selecting and querying the most valuable unlabeled bags iteratively. Existing works on MIAL evaluate an unlabeled bag by its informativeness with regard to the current classifier, but neglect the internal distribution of its instances, which can reflect the diversity of the bag. In this paper, two diversity criteria, i.e., clustering-based diversity and fuzzy rough set based diversity, are proposed for MIAL by utilizing a support vector machine (SVM) based MIL classifier. In the first criterion, a kernel $k$-means clustering algorithm is used to explore the hidden structure of the instances in the feature space of the SVM, and the diversity degree of an unlabeled bag is measured by the number of unique clusters covered by the bag. In the second criterion, the lower approximations in fuzzy rough sets are used to define a new concept named dissimilarity degree, which depicts the uniqueness of an instance so as to measure the diversity degree of a bag. By incorporating the proposed diversity criteria with existing informativeness measurements, new MIAL algorithms are developed, which can select bags with both high informativeness and diversity. Experimental comparisons demonstrate the feasibility and effectiveness of the proposed methods.

[1]  Witold Pedrycz,et al.  Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications , 2010, Int. J. Approx. Reason..

[2]  Sampath Srinivas,et al.  A Generalization of the Noisy-Or Model , 1993, UAI.

[3]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[4]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[5]  Sam Kwong,et al.  Active learning with multi-criteria decision making systems , 2014, Pattern Recognit..

[6]  Subhransu Maji,et al.  Fast and Accurate Digit Classification , 2009 .

[7]  Lixing Chen,et al.  FIM-Based Pairwise Selection for Active Learning on Imbalanced Datasets , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[8]  Qinghua Hu,et al.  On Robust Fuzzy Rough Set Models , 2012, IEEE Transactions on Fuzzy Systems.

[9]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[10]  Ya Zhang,et al.  Active Learning for Ranking through Expected Loss Optimization , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[12]  Sam Kwong,et al.  Fuzzy-Rough-Set-Based Active Learning , 2014, IEEE Transactions on Fuzzy Systems.

[13]  Yu-Chiang Frank Wang,et al.  Query-Adaptive Multiple Instance Learning for Video Instance Retrieval , 2015, IEEE Transactions on Image Processing.

[14]  Panos M. Pardalos,et al.  Multiple instance learning via margin maximization , 2010 .

[15]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[16]  Dong Wang,et al.  Multiple-Instance Learning Via Random Walk , 2006, ECML.

[17]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[18]  Jing Hua,et al.  Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Sam Kwong,et al.  Inconsistency-based active learning for support vector machines , 2012, Pattern Recognit..

[20]  Xizhao Wang,et al.  On the generalization of fuzzy rough sets , 2005, IEEE Transactions on Fuzzy Systems.

[21]  Anlong Ming,et al.  Fast human detection using mi-sVM and a cascade of HOG-LBP features , 2010, 2010 IEEE International Conference on Image Processing.

[22]  Peter Auer,et al.  On Learning From Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach , 1997, ICML.

[23]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[24]  Didier Dubois,et al.  Putting Rough Sets and Fuzzy Sets Together , 1992, Intelligent Decision Support.

[25]  Chi-Yin Chow,et al.  Ambiguity-Based Multiclass Active Learning , 2016, IEEE Transactions on Fuzzy Systems.

[26]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[29]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[30]  Fei Wang,et al.  Interactive localized content based image retrieval with multiple-instance active learning , 2010, Pattern Recognit..

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[33]  Paul D. Gader,et al.  Random set framework for multiple instance learning , 2011, Inf. Sci..

[34]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Longbing Cao,et al.  A Similarity-Based Classification Framework for Multiple-Instance Learning , 2014, IEEE Transactions on Cybernetics.

[36]  Jian Fu,et al.  Bag-level active multi-instance learning , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[37]  Rong Jin,et al.  Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval , 2009, IEEE Transactions on Knowledge and Data Engineering.

[38]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[39]  Zhan Li,et al.  LSA based multi-instance learning algorithm for image retrieval , 2011, Signal Process..

[40]  Dong Liu,et al.  Multiple-Instance Active Learning for Image Categorization , 2009, MMM.

[41]  Shuenn-Ren Cheng,et al.  Multiple-instance content-based image retrieval employing isometric embedded similarity measure , 2009, Pattern Recognit..

[42]  Daniel P. W. Ellis,et al.  Multiple-Instance Learning for Music Information Retrieval , 2008, ISMIR.

[43]  Changzhe Jiao,et al.  Functions of Multiple Instances for Learning Target Signatures , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Yang Wang,et al.  Multilabel Image Classification Via High-Order Label Correlation Driven Active Learning , 2014, IEEE Transactions on Image Processing.

[45]  Sebastián Ventura,et al.  G3P-MI: A genetic programming algorithm for multiple instance learning , 2010, Inf. Sci..