A Comparison of Multiple Instance and Group Based Learning

In this paper we compare the performance of a number of multiple-instance learning (MIL) and group based (GB) classification algorithms on both a synthetic and real-world Pap smear dataset. We utilise the synthetic dataset to demonstrate that performance improves as both bag size and percent positives increase and that MIL outperforms GB algorithms when the percentage positives is less than 50%. However, as the positive bags become increasingly homogeneous, as is apparent on the real-world dataset, the two approaches become comparable. This result highlights that the performance of a MIL or GB algorithm will be maximised when the algorithm's MIL assumption matches the reality of the dataset. Therefore, on the Pap smear dataset, algorithms with a more generalised MIL assumption demonstrate the strongest performance.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Andrew P. Bradley,et al.  Group-based meta-classification , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Andrew P. Bradley,et al.  Nearest neighbour group-based classification , 2010, Pattern Recognit..

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Murat Dundar,et al.  Multiple-Instance Learning Algorithms for Computer-Aided Detection , 2008, IEEE Transactions on Biomedical Engineering.

[6]  Zhi-Hua Zhou,et al.  Adapting RBF Neural Networks to Multi-Instance Learning , 2006, Neural Processing Letters.

[7]  Jun Yang Review of Multi-Instance Learning and Its applications , 2005 .

[8]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[9]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[10]  Andrew James Heinrich Mehnert Image analysis for the study of chromatin distribution in cell nuclei with application to cervical cancer screening , 2003 .

[11]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[12]  James R. Foulds,et al.  Revisiting Multiple-Instance Learning Via Embedded Instance Selection , 2008, Australasian Conference on Artificial Intelligence.

[13]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[14]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ewert Bengtsson,et al.  A Feature Set for Cytometry on Digitized Microscopic Images , 2003, Analytical cellular pathology : the journal of the European Society for Analytical Cellular Pathology.