Application of Rough Sets in k Nearest Neighbours Algorithm for Classification of Incomplete Samples

Algorithm k-nn is often used for classification, but distance measures used in this algorithm are usually designed to work with real and known data. In real application the input values are imperfect—imprecise, uncertain and even missing. In the most applications, the last issue is solved using marginalization or imputation. These methods unfortunately have many drawbacks. Choice of specific imputation has big impact on classifier answer. On the other hand, marginalization can cause that even a large part of possessed data may be ignored. Therefore, in the paper a new algorithm is proposed. It is designed for work with interval type of input data and in case of lacks in the sample analyses whole domain of possible values for corresponding attributes. Proposed system generalize k-nn algorithm and gives rough-specific answer, which states if the test sample may or must belong to the certain set of classes. The important feature of the proposed system is, that it reduces the set of the possible classes and specifies the set of certain classes in the way of filling the missing values by set of possible values.

[1]  Marcin Gabryel,et al.  Object Detection by Simple Fuzzy Classifiers Generated by Boosting , 2013, ICAISC.

[2]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[3]  Ming He,et al.  Research on Attribute Reduction Using Rough Neighborhood Model , 2008, 2008 International Seminar on Business and Information Management.

[4]  Ronald R. Yager,et al.  Using fuzzy methods to model nearest neighbor rules , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[5]  L. Rutkowski,et al.  Flexible Takagi-Sugeno fuzzy systems , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[6]  Xiaoyong Du,et al.  Improving performance of the k-nearest neighbor classifier by tolerant rough sets , 2001, Proceedings of the Third International Symposium on Cooperative Database Systems for Advanced Applications. CODAS 2001.

[7]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Robert Nowicki,et al.  On Combining Neuro-Fuzzy Architectures with the Rough Set Theory to Solve Classification Problems with Incomplete Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Manish Sarkar,et al.  Fuzzy-rough nearest neighbors algorithm , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[11]  Naohiro Ishii,et al.  Modified Reduct: Nearest Neighbor Classification , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.

[12]  Robert Nowicki,et al.  On classification with missing data using rough-neuro-fuzzy systems , 2010, Int. J. Appl. Math. Comput. Sci..

[13]  Chris Cornelis,et al.  Fuzzy rough positive region based nearest neighbour classification , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[14]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[15]  Andrzej Bargiela,et al.  Granular clustering: a granular signature of data , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[17]  Marcin Woźniak,et al.  Rough k nearest neighbours for classification in the case of missing input data , 2014 .

[18]  Leszek Rutkowski On Bayes Risk Consistent Pattern Recognition Procedures in a Quasi-Stationary Environment , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Thomas Villmann,et al.  Fuzzy Labeled Soft Nearest Neighbor Classification with Relevance Learning , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Miroslaw Pawlak,et al.  Kernel classification rules from missing data , 1993, IEEE Trans. Inf. Theory.

[22]  Naohiro Ishii,et al.  Mapping of nearest neighbor for classification , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[23]  Zdzislaw Pawlak,et al.  Rough sets, decision algorithms and Bayes' theorem , 2002, Eur. J. Oper. Res..

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  W. Greblicki,et al.  Density-free Bayes risk consistency of nonparametric pattern recognition procedures , 1981, Proceedings of the IEEE.

[26]  Rafal Scherer,et al.  Neuro-fuzzy Systems with Relation Matrix , 2010, ICAISC.

[27]  Leszek Rutkowski,et al.  Adaptive probabilistic neural networks for pattern classification in time-varying environment , 2004, IEEE Transactions on Neural Networks.

[28]  Robert Nowicki,et al.  Rough Neuro-Fuzzy Structures for Classification With Missing Data , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).