A First Study on the Use of Noise Filtering to Clean the Bags in Multi-Instance Classification

Data in the real world is far from being perfect. The appearance of noise is a common issue that arises from the limitations of data adquisition mechanisms and human knowledge. In classification, label noise will hinder the performance of any classifier, inducing a bias in the model built. While label noise has attracted the attention of researchers in standard classification lately, its study in multi-instance classification has just begun. In this work, we propose the usage of a filtering algorithm for multi-instance classification that is able to reduce the impact of negative instances within the bags. In order to do so, we decompose the bags to form a standard classification problem that can be efficiently treated by a specialized noise filter. The bags are then rebuilt, without the eliminated instances. In our experiments, we show that by applying our approach we can diminish the impact of noise and even obtain better results at 0% noise level for several classifiers. Our approach opens a promising way to deal with noise in the bags of multi-instance datasets and further improve the classification rate of the models constructed.

[1]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[3]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[4]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[5]  Baoxin Li,et al.  Multiple Class Multiple-Instance Learning and its Application to Image Categorization , 2007, Int. J. Image Graph..

[6]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[7]  Francisco Herrera,et al.  Multiple Instance Learning , 2016 .

[8]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[9]  Marco Loog,et al.  Multiple instance learning with bag dissimilarities , 2013, Pattern Recognit..

[10]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[11]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[12]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[13]  Maoguo Gong,et al.  RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[15]  Ashwin Srinivasan,et al.  Multi-instance tree learning , 2005, ICML.

[16]  Bo Sun,et al.  A robust multi-class AdaBoost algorithm for mislabeled noisy data , 2016, Knowl. Based Syst..

[17]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[18]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[19]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[20]  Nuno Vasconcelos,et al.  Multiple instance learning for soft bags via top instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[22]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[23]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[24]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[25]  Ethem Alpaydin,et al.  Single- vs. multiple-instance classification , 2015, Pattern Recognit..

[26]  Yang Song,et al.  Handling label noise in video classification via multiple instance learning , 2011, 2011 International Conference on Computer Vision.

[27]  Robert P. W. Duin,et al.  Multiple-instance learning as a classifier combining problem , 2013, Pattern Recognit..

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..