Incremental exemplar learning schemes for classification on embedded devices

AbstractAlthough memory-based classifiers offer robust classification performance, their widespread usage on embedded devices is hindered due to the device’s limited memory resources. Moreover, embedded devices often operate in an environment where data exhibits evolutionary changes which entails frequent update of the in-memory training data. A viable option for dealing with the memory constraint is to use Exemplar Learning (EL) schemes that learn a small memory set (called the exemplar set) of high functional information that fits in memory. However, traditional EL schemes have several drawbacks that make them inapplicable for embedded devices; (1) they have high memory overheads and are unable to handle incremental updates to the exemplar set, (2) they cannot be customized to obtain exemplar sets of any user-defined size that fits in the memory and (3) they learn exemplar sets based on local neighborhood structures that do not offer robust classification performance. In this paper, we propose two novel EL schemes, $\mathsf{EBEL}$ (Entropy-Based Exemplar Learning) and $\mathsf{ABEL}$ (AUC-Based Exemplar Learning) that overcome the aforementioned short-comings of traditional EL algorithms. We show that our schemes efficiently incorporate new training datasets while maintaining high quality exemplar sets of any user-defined size. We present a comprehensive experimental analysis showing excellent classification-accuracy versus memory-usage tradeoffs using our proposed methods.

[1]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[2]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[3]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[4]  Keinosuke Fukunaga,et al.  The Reduced Parzen Classifier , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[7]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[8]  Barry Smyth,et al.  Building Compact Competent Case-Bases , 1999, ICCBR.

[9]  A. Thomasian Review of 'Transmission of Information, A Statistical Theory of Communications' (Fano, R. M.; 1961) , 1962 .

[10]  Marek Grochowski,et al.  Instances Selection Algorithms in the Conjunction with LVQ , 2005, Artificial Intelligence and Applications.

[11]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[15]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[16]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[17]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[18]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[19]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[20]  Qiang Yang,et al.  Remembering to Add: Competence-preserving Case-Addition Policies for Case Base Maintenance , 1999, IJCAI.

[21]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.