Nearest Prototype Classification of Special School Families Based on Hierarchical Compact Sets Clustering

The family orientation process in Cuban Schools for children with Affective – Behavioral Maladies (SABM) involves clustering and classification of mixed type data with non-symmetric similarity functions. To improve this process, this paper includes some novel characteristics in clustering and prototype selection. The proposed approach uses a hierarchical clustering based on compact sets, making it suitable for dealing with non-symmetric similarity functions, as well as with mixed and incomplete data. The proposal obtains very good results on the SABM data, and over repository databases. In addition, the proposed clustering method is able to detect the true partitions of data and it was significantly better with respect to others according to external validity indexes. In prototype selection, the proposal obtains a highly reduced prototype set, while maintains the original classifier accuracy.

[1]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  José Francisco Martínez Trinidad,et al.  Finding Small Consistent Subset for the Nearest Neighbor Classifier Based on Support Graphs , 2009, CIARP.

[3]  Kazuo Hattori,et al.  A new edited k-nearest neighbor rule in the pattern classification problem , 2000, Pattern Recognit..

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Lipika Dey,et al.  A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets , 2011, Pattern Recognit. Lett..

[6]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[7]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[8]  Luis Alvarez,et al.  Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications , 2012, Lecture Notes in Computer Science.

[9]  Lokesh Kumar Sharma,et al.  Genetic K-Means Clustering Algorithm for Mixed Numeric and Categorical Data Sets , 2010 .

[10]  José Francisco Martínez Trinidad,et al.  Prototype Selection Via Prototype Relevance , 2008, CIARP.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .