Multiple-Instance Learning by Boosting Infinitely Many Shapelet-based Classifiers

We propose a new formulation of Multiple-Instance Learning (MIL). In typical MIL settings, a unit of data is given as a set of instances called a bag and the goal is to find a good classifier of bags based on similarity from a single or finitely many "shapelets" (or patterns), where the similarity of the bag from a shapelet is the maximum similarity of instances in the bag. Classifiers based on a single shapelet are not sufficiently strong for certain applications. Additionally, previous work with multiple shapelets has heuristically chosen some of the instances as shapelets with no theoretical guarantee of its generalization ability. Our formulation provides a richer class of the final classifiers based on infinitely many shapelets. We provide an efficient algorithm for the new formulation, in addition to generalization bound. Our empirical study demonstrates that our approach is effective not only for MIL tasks but also for Shapelet Learning for time-series classification.

[1]  Maria Rifqi,et al.  Random-shapelet: An algorithm for fast shapelet discovery , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[2]  Akiko Takeda,et al.  Boosting the kernelized shapelets: Theory and algorithms for local features , 2017, ArXiv.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[5]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[6]  A. Shapiro Semi-infinite programming, duality, discretization and optimality conditions , 2009 .

[7]  Naftali Tishby,et al.  Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[8]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[9]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[11]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[12]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[13]  Gary Doran,et al.  Multiple-Instance Learning from Distributions , 2016, J. Mach. Learn. Res..

[14]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[15]  Gary Doran,et al.  A theoretical and empirical analysis of support vector machine methods for multiple-instance classification , 2014, Machine Learning.

[16]  Peter Auer,et al.  A Boosting Approach to Multiple Instance Learning , 2004, ECML.

[17]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[18]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[19]  Dan Zhang,et al.  MILEAGE: Multiple Instance LEArning with Global Embedding , 2013, ICML.

[20]  Thomas Hofmann,et al.  Multiple-Instance Learning via Disjunctive Programming Boosting , 2003, NIPS.

[21]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[22]  Pham Dinh Tao,et al.  Duality in D.C. (Difference of Convex functions) Optimization. Subgradient Methods , 1988 .

[23]  Eric Granger,et al.  Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[24]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[25]  Jacek M. Zurada,et al.  Efficient Learning of Timeseries Shapelets , 2016, AAAI.

[26]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[27]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[28]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[29]  Lars Schmidt-Thieme,et al.  Scalable Discovery of Time-Series Shapelets , 2015, ArXiv.