Converting non-parametric distance-based classification to anytime algorithms

For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have anywhere from several milliseconds to several minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can continue computations to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains. We further show the utility of our work with two deployed applications, in classifying and counting fish, and in classifying insects.

[1]  Juan J. Navarro,et al.  Exploiting computer resources for fast nearest neighbor classification , 2007, Pattern Analysis and Applications.

[2]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[3]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[4]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[8]  Dah-Jye Lee,et al.  Contour matching for a fish recognition and migration-monitoring system , 2004, SPIE Optics East.

[9]  Ki-Chung Chung,et al.  Face recognition using principal component analysis of Gabor filter responses , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[10]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[11]  Noel E. O'Connor,et al.  A multiscale representation method for nonrigid shapes with a single closed contour , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Swaminathan Natarajan Imprecise and Approximate Computation , 1995 .

[13]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[14]  I. Kotenko,et al.  The control of teams of autonomous objects in the time-constrained environments , 2002, Proceedings 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002).

[15]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[16]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[17]  Geoffrey I. Webb,et al.  Classifying under computational resource constraints: anytime classification using probabilistic estimators , 2007, Machine Learning.

[18]  Shlomo Zilberstein,et al.  Approximate Reasoning Using Anytime Algorithms , 1995 .

[19]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[20]  Marilyn A. Walker,et al.  A Boosting Approach to Topic Spotting on Subdialogues , 2000, ICML.

[21]  José Salvador Sánchez,et al.  Decision boundary preserving prototype selection for nearest neighbor classification , 2005, Int. J. Pattern Recognit. Artif. Intell..

[22]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[23]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[24]  Jean-Marie Aerts,et al.  Automatic detection of infective pig coughing from continuous recording in field situations , 2004 .

[25]  ulya Yal,et al.  Visual processing and classification of items on a moving conveyor : a selective perception approach , 2002 .

[26]  Eamonn J. Keogh,et al.  Atomic wedgie: efficient query filtering for streaming time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  Shaul Markovitch,et al.  Interruptible anytime algorithms for iterative improvement of decision trees , 2005, UBDM '05.

[28]  Juan José Rodríguez Diez,et al.  Interval and dynamic time warping-based decision trees , 2004, SAC '04.

[29]  Yang Tao,et al.  DUAL-CAMERA NIR/MIR IMAGING FOR STEM-END/CALYX IDENTIFICATION IN APPLE DEFECT SORTING , 2000 .

[30]  Seiji Yamada,et al.  Constructing a Personal Web Map with Anytime-Control of Web Robots , 2002, Int. J. Cooperative Inf. Syst..

[31]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[32]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[33]  Mohamed Medhat Gaber,et al.  Resource-aware Very Fast K-Means for ubiquitous data stream mining , 2005 .

[34]  José Martínez Sotoca,et al.  An analysis of how training data complexity affects the nearest neighbor classifiers , 2007, Pattern Analysis and Applications.

[35]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[36]  Shaul Markovitch,et al.  Learning to Order BDD Variables in Verification , 2011, J. Artif. Intell. Res..

[37]  H. Ritter,et al.  Interactive online learning , 2007, Pattern Recognition and Image Analysis.

[38]  Tony Lindgren Anytime inductive logic programming , 2000, Computers and Their Applications.

[39]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.