Classifying with confidence from incomplete information

We consider the problem of classifying a test sample given incomplete information. This problem arises naturally when data about a test sample is collected over time, or when costs must be incurred to compute the classification features. For example, in a distributed sensor network only a fraction of the sensors may have reported measurements at a certain time, and additional time, power, and bandwidth is needed to collect the complete data to classify. A practical goal is to assign a class label as soon as enough data is available to make a good decision. We formalize this goal through the notion of reliability--the probability that a label assigned given incomplete data would be the same as the label assigned given the complete data, and we propose a method to classify incomplete data only if some reliability threshold is met. Our approach models the complete data as a random variable whose distribution is dependent on the current incomplete data and the (complete) training data. The method differs from standard imputation strategies in that our focus is on determining the reliability of the classification decision, rather than just the class label. We show that the method provides useful reliability estimates of the correctness of the imputed class labels on a set of experiments on time-series data sets, where the goal is to classify the time-series as early as possible while still guaranteeing that the reliability threshold is met.

[1]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[2]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[3]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[4]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[5]  Shaoning Pang,et al.  Incremental linear discriminant analysis for classification of data streams , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Carlos J. Alonso,et al.  Boosting Interval-Based Literals: Variable Length and Early Classification , 2002 .

[9]  P. Armitage Sequential Analysis with More than Two Alternative Hypotheses, and its Relation to Discriminant Function Analysis , 1950 .

[10]  Dale Schuurmans,et al.  Learning to classify incomplete examples , 1997, COLT 1997.

[11]  Hyrum S. Anderson,et al.  Reliable early classification of time series , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  José Mario Martínez,et al.  Local Minimizers of Quadratic Functions on Euclidean Balls and Spheres , 1994, SIAM J. Optim..

[13]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[14]  Juan José Rodríguez Diez,et al.  Boosting Interval-Based Literals: Variable Length and Early Classification , 2003 .

[15]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[16]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[17]  Abdelmonem A. Afifi,et al.  Comparison of Stopping Rules in Forward Stepwise Discriminant Analysis , 1979 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[20]  Maya R. Gupta,et al.  Completely Lazy Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[22]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[23]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[24]  Philip S. Yu,et al.  Early prediction on time series: a nearest neighbor approach , 2009, IJCAI 2009.

[25]  Maya R. Gupta,et al.  Dimensionality Reduction by Local Discriminative Gaussians , 2012, ICML.

[26]  Edward R. Dougherty,et al.  How many samples are needed to build a classifier: a general sequential approach , 2005, Bioinform..

[27]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.