k-Nearest Neighbor Classification Algorithm for Multiple Choice Sequential Sampling

k-Nearest Neighbor Classification Algorithm for Multiple Choice Sequential Sampling Yung-Kyun Noh (nohyung@snu.ac.kr) Frank Chongwoo Park (fcp@snu.ac.kr) School of Mechanical and Aerospace Engineering, Seoul National University Seoul 151-744, Korea Daniel D. Lee (ddlee@seas.upenn.edu) Department of Electrical and Systems Engineering, University of Pennsylvania Philadelphia, PA 19104, USA Abstract Decision making from sequential sampling, especially when more than two alternative choices are possible, requires appro- priate stopping criteria to maximize accuracy under time con- straints. Optimal conditions for stopping have previously been investigated for modeling human decision making processes. In this work, we show how the k-nearest neighbor classification algorithm in machine learning can be utilized as a mathemati- cal framework to derive a variety of novel sequential sampling models. We interpret these nearest neighbor models in the con- text of diffusion decision making (DDM) methods. We com- pare these nearest neighbor methods to exemplar-based models and accumulator models, such as Race and LCA. Computa- tional experiments show that the new models demonstrate sig- nificantly higher accuracy given equivalent time constraints. Keywords: sequential sampling; decision making; diffusion decision making model; k-nearest neighbor classification; evi- dence; sequential probability ratio test Introduction Whenever a faster decision is required to save time and re- sources, the decision making process should focus on choos- ing whether to proceed with a decision in light of the given in- formation or to postpone the decision in order to collect more information for a higher confidence level. In many previous and recent psychology works, various computational models have been introduced seeking to explain the speed-accuracy tradeoff and to understand the decision making process in hu- mans. However, apart from the understanding of individual models, there has been little systematic way of understand- ing these models in one mathematically unified framework. Moreover, multiple-choice problems were not discussed in- tensively in any of the methods. The optimality in decision making with sequential sam- pling is discussed with the optimality in speed-accuracy tradeoff. In other words, the objective of the present work is to seek the fastest decision with the same average accuracy or the maximum accuracy if the same average decision time is used. Sequential sampling methods such as Race (Smith & Vickers, 1988; Vickers, 1970), diffusion decision making (DDM) (Ratcliff, 1978; Ratcliff & Rouder, 2000; Shadlen, Hanks, Churchland, Kiani, & Yang, 2006; Ratcliff & Mck- oon, 2008), and leaky competing accumulator (LCA) (Usher & McClelland, 2001; Bogacz, Usher, Zhang, & McClelland, 2007) are all interested in explaining this optimality in the speed-accuracy tradeoff. In these methods, one or more vari- ables are commonly introduced for accumulating sampled in- formation, and a criterion is used to determine whether to continue collecting more information or to make a decision with given information. Here, we propose a common mathe- matical framework combining these methods and providing a systematic explanation for understanding different methods. Our framework combining sequential sampling methods is the k-nearest neighbor (NN) classification in machine learn- ing. The sequential sampling situation with multiple choices is explained as the multiway k-NN classification from the the- oretical analysis on k-NNs in the asymptotic situation. Due to this connection, we can interpret all different types of se- quential sampling methods as different methods of choosing k adaptively in k-NN classification. By further analyzing the strategy of choosing k in k-NN classification using the Se- quential Probability Ratio Test (SPRT) (Wald & Wolfowitz, 1948) and Bayesian inference, we can obtain five different ac- cumulating variable and stopping criteria for optimal tradeoff. Interestingly, all these five optimal methods are interpreted as different kinds of DDM strategies. Our work is directly applied to a recently reported neuro- scientific decision making mechanism. The proposed mech- anism considers an output neuron which sends out a decision result. By collecting Poisson spike trains from different neu- rons, the output neuron makes a decision about which neuron gives Poisson spikes at the highest rate (Shadlen & Newsome, 1998; Ma, Beck, Latham, & Pouget, 2006; Beck et al., 2008; Zhang & Bogacz, 2010). The output neuron can achieve op- timality by using our proposed strategies. The proposed method can be compared with traditional ex- emplar models which explain memory retrieval using similar- ity weighted voting based on stored exemplars. Our work is different from this line of research by using majority voting of adaptively chosen k number of NNs. We discuss the ad- vantages and disadvantages of our method when it is applied to the memory retrieval problem. The rest of the paper is organized as follows. We introduce the sequential sampling problem in Section 2 especially from the point of view of multiple-choice. In Section 3, we in- troduce problems to which sequential sampling methods can be applied, and we show how k-NN classification can be natu-

[1]  R. Nosofsky,et al.  An exemplar-based random walk model of speeded classification. , 1997, Psychological review.

[2]  Yannis Manolopoulos,et al.  Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors , 2007, ADBIS.

[3]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[4]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986 .

[5]  Roger Ratcliff,et al.  A Theory of Memory Retrieval. , 1978 .

[6]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[7]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[8]  Venugopal V. Veeravalli,et al.  Multihypothesis sequential probability ratio tests - Part I: Asymptotic optimality , 1999, IEEE Trans. Inf. Theory.

[9]  W. Newsome,et al.  The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding , 1998, The Journal of Neuroscience.

[10]  D. Vickers,et al.  Evidence for an accumulator model of psychophysical discrimination. , 1970, Ergonomics.

[11]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[12]  Marius Usher,et al.  Extending a biologically inspired model of choice: multi-alternatives, nonlinearity and value-based multidimensional choice , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Philip L. Smith,et al.  The accumulator model of two-choice discrimination , 1988 .

[14]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[15]  Timothy D. Hanks,et al.  Probabilistic Population Codes for Bayesian Decision Making , 2008, Neuron.

[16]  Roger Ratcliff,et al.  The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks , 2008, Neural Computation.

[17]  Adam N Sanborn,et al.  Exemplar models as a mechanism for performing Bayesian inference , 2010, Psychonomic bulletin & review.

[18]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[19]  B. Schölkopf,et al.  Generalization and similarity in exemplar models of categorization: Insights from machine learning , 2008, Psychonomic bulletin & review.

[20]  Anil K. Jain,et al.  NOTE ON DISTANCE-WEIGHTED k-NEAREST NEIGHBOR RULES. , 1978 .

[21]  Rafal Bogacz,et al.  Optimal Decision Making on the Basis of Evidence Represented in Spike Trains , 2010, Neural Computation.

[22]  Jeffrey N. Rouder,et al.  A diffusion model account of masking in two-choice letter identification. , 2000, Journal of experimental psychology. Human perception and performance.

[23]  Michael N. Shadlen,et al.  The Speed and Accuracy of a Simple Perceptual Decision: A Mathematical Primer. , 2007 .

[24]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[25]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[26]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[27]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .