K-means clustering algorithm for multimedia applications with flexible HW/SW co-design

In this paper, we report a hardware/software (HW/SW) co-designed K-means clustering algorithm with high flexibility and high performance for machine learning, pattern recognition and multimedia applications. The contributions of this work can be attributed to two aspects. The first is the hardware architecture for nearest neighbor searching, which is used to overcome the main computational cost of a K-means clustering algorithm. The second aspect is the high flexibility for different applications which comes from not only the software but also the hardware. High flexibility with respect to the number of training data samples, the dimensionality of each sample vector, the number of clusters, and the target application, is one of the major shortcomings of dedicated hardware implementations for the K-means algorithm. In particular, the HW/SW K-means algorithm is extendable to embedded systems and mobile devices. We benchmark our multi-purpose K-means system against the application of handwritten digit recognition, face recognition and image segmentation to demonstrate its excellent performance, high flexibility, fast clustering speed, short recognition time, good recognition rate and versatile functionality.

[1]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[2]  Hans Jurgen Mattausch,et al.  Mixed Digital–Analog Associative Memory Enabling Fully-Parallel Nearest Euclidean Distance Search , 2007 .

[3]  Hong Yan,et al.  Color image segmentation using fuzzy clustering and supervised learning , 1994, J. Electronic Imaging.

[4]  Dominique Lavenier,et al.  Experience with a Hybrid Processor: K-Means Clustering , 2004, The Journal of Supercomputing.

[5]  Gerhard Rigoll,et al.  High quality face recognition in JPEG compressed images , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[6]  Uday B. Desai,et al.  Face recognition using a DCT-HMM approach , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[7]  Roy H. Campbell,et al.  A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.

[8]  Leonardo Maria Reyneri Implementation issues of neuro-fuzzy hardware: going toward HW/SW codesign , 2003, IEEE Trans. Neural Networks.

[9]  Masaki Nakagawa,et al.  Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition , 2001, Pattern Recognit..

[10]  Akio Kawabata,et al.  Low-power word-parallel nearest-Hamming-distance search circuit based on frequency mapping , 2010, 2010 Proceedings of ESSCIRC.

[11]  Tsutomu Maruyama Real-time K-Means Clustering for Color Images on Reconfigurable Hardware , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Shao-Yi Chien,et al.  Flexible Hardware Architecture of Hierarchical K-Means Clustering for Large Cluster Number , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Yuan Taur,et al.  Device scaling limits of Si MOSFETs and their application dependencies , 2001, Proc. IEEE.

[16]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[17]  Elias S. Manolakos,et al.  IP-cores design for the kNN classifier , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[18]  Gert Cauwenberghs,et al.  Kerneltron: support vector "machine" in silicon , 2003, IEEE Trans. Neural Networks.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  David P. Rodgers,et al.  Improvements in multiprocessor system design , 1985, ISCA '85.

[21]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[22]  Sang Uk Lee,et al.  Color image segmentation based on 3-D clustering: morphological approach , 1998, Pattern Recognit..

[23]  Davide Anguita,et al.  A digital architecture for support vector machines: theory, algorithm, and FPGA implementation , 2003, IEEE Trans. Neural Networks.

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[26]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[27]  Andrea Boni,et al.  FPGA Implementation of Support Vector Machines with Pseudo-Logarithmic Number Representation , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[28]  Meng Joo Er,et al.  High-speed face recognition based on discrete cosine transform and RBF neural networks , 2005, IEEE Transactions on Neural Networks.

[29]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[30]  He Li,et al.  K-Means on Commodity GPUs with CUDA , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[31]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[32]  Alistair G. Rust,et al.  Image redundancy reduction for neural network classification using discrete cosine transforms , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[33]  Hans Jurgen Mattausch,et al.  Associative memory with fully parallel nearest-Manhattan-distance search for low-power real-time single-chip applications , 2004 .

[34]  Hans Jurgen Mattausch,et al.  Compact associative-memory architecture with fully parallel search capability for the minimum Hamming distance , 2002, IEEE J. Solid State Circuits.