A new nearest neighbor classification method based on fuzzy set theory and aggregation operators

New Fuzzy Nearest Neighbor Classification Method, called Fuzzy Analogy Based Classification (FABC).Describing the domain features by fuzzy sets.Management of uncertainty and impreciseness in classification process by means of aggregation operators.Promising results of the new classifier and compared with advanced Fuzzy Nearest Neighbor Classifiers. The Fuzzy Nearest Neighbor Classification (FuzzyNNC) has been successfully used, as a tool to deal with supervised classification problems. It has significantly increased the classification accuracy by considering the uncertainty associated with the class labels of the training patterns. Nevertheless, FuzzyNNC's limited methods fail to efficiently handle the imprecision in features measurement and the uncertainty induced by the choice of the distance measure and the number of neighbors in the decision rule. In this paper, we propose a new method called Fuzzy Analogy-based Classification (FABC) to tackle the FuzzyNNC limitations. In this work, we exploit the fuzzy linguistic modeling and approximate reasoning materials in order to endow FABC with intelligent capabilities, like imprecision tolerance, optimization, adaptability and trade-off. Hence, our approach is composed of two main steps. Firstly, we describe the domain features using fuzzy linguistic variables. Secondly, we define the classification process using two intelligent aggregation operators. The first one allows the optimization of the similarity evaluation, by defining the adequate features to be considered. The second one integrates a trade-off strategy within the decision rule, by using a global voting approach with compensation property. The integration of such mechanisms will increase the classification accuracy and make the FuzzyNNC approach more useful for classification problems where imprecision and uncertainty are unavoidable. The proposed FABC is validated on the most known datasets, representing various classification difficulties and compared to the many extensions of the FuzzyNNC approach. The results obtained show that our proposed FABC method can be adapted to different classification problems and improve the classification accuracy. Thus, the FABC has the best rank value against the comparison methods with high significant level. Moreover, we conclude that our optimized similarity and global voting rule are more robust to handle the uncertainty in the classification process than those used by the comparison methods.

[1]  W AhaDavid,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms , 1997 .

[2]  Ludmil Mikhailov,et al.  An interpretable fuzzy rule-based classification methodology for medical diagnosis , 2009, Artif. Intell. Medicine.

[3]  Hisao Ishibuchi,et al.  Simple fuzzy rule-based classification systems perform well on commonly used real-world data sets , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Onesfole Kurama,et al.  Similarity classifier with ordered weighted averaging operators , 2013, Expert Syst. Appl..

[6]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[7]  Radko Mesiar,et al.  Aggregation functions: Means , 2011, Inf. Sci..

[8]  Aytürk Keles,et al.  Neuro-fuzzy classification of prostate cancer using NEFCLASS-J , 2007, Comput. Biol. Medicine.

[9]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[10]  Alain Abran,et al.  Evaluating software project similarity by using linguistic quantifier guided aggregations , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[11]  A. Jówik,et al.  A learning scheme for a fuzzy k-NN rule , 1983 .

[12]  Francisco Herrera,et al.  Computing with words in decision making: foundations, trends and prospects , 2009, Fuzzy Optim. Decis. Mak..

[13]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[14]  Jaime Gil-Aluja,et al.  Fuzzy Sets in the Management of Uncertainty , 2004 .

[15]  Thierry Denoeux A k -Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory , 2008, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Seung-Yeon Kim,et al.  Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method , 2005, Bioinform..

[18]  Herbert R. do N. Costa,et al.  Fuzzy Decision Tree applied to defects classification of glass manufacturing using data from a glass furnace model , 2012, 2012 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS).

[19]  Joon H. Han,et al.  A fuzzy K-NN algorithm using weights from the variance of membership values , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  H. Zimmermann,et al.  Latent connectives in human decision making , 1980 .

[21]  Gang Wang,et al.  A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method , 2011, Knowl. Based Syst..

[22]  Michel Grabisch,et al.  Fuzzy aggregation of numerical preferences , 1999 .

[23]  Yuehwern Yih,et al.  Knowledge acquisition through information granulation for imbalanced data , 2006, Expert Syst. Appl..

[24]  Francisco Herrera,et al.  Different Proposals to Improve the Accuracy of Fuzzy Linguistic Modeling , 2000 .

[25]  Swarup Medasani,et al.  An overview of membership function generation techniques for pattern recognition , 1998, Int. J. Approx. Reason..

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[27]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[28]  Francisco Herrera,et al.  Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets , 2016, Inf. Sci..

[29]  Alain Abran,et al.  COCOMO cost model using fuzzy logic , 2000 .

[30]  Elena García Barriocanal,et al.  Software cost estimation with fuzzy inputs: Fuzzy modelling and aggregation of cost drivers , 2005, Kybernetika.

[31]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[33]  Alberto Suárez,et al.  Globally Optimal Fuzzy Decision Trees for Classification and Regression , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[36]  Fernando Fernández,et al.  Local Feature Weighting in Nearest Prototype Classification , 2008, IEEE Transactions on Neural Networks.

[37]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[38]  Marcin Detyniecki,et al.  Mathematical Aggregation Operators and their Application to Video Querying , 2000 .

[39]  Andrew K. C. Wong,et al.  A fuzzy approach to partitioning continuous attributes for classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[40]  Francisco Herrera,et al.  Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects , 2014, Inf. Sci..

[41]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[42]  Ioan Dumitrache,et al.  Expert system for medicine diagnosis using software agents , 2015, Expert Syst. Appl..

[43]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[44]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[45]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[46]  Robert Ivor John,et al.  A method of learning weighted similarity function to improve the performance of nearest neighbor , 2009, Inf. Sci..

[47]  A. Ghosh On optimum choice of k in nearest neighbor classification , 2006 .

[48]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[49]  Tuan D. Pham,et al.  An Optimally Weighted Fuzzy k-NN Algorithm , 2005, ICAPR.

[50]  Alain Abran,et al.  Generating fuzzy term sets for software project attributes by using fuzzy C-means and real coded genetic algorithms , 2006 .

[51]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[52]  Lotfi A. Zadeh,et al.  A COMPUTATIONAL APPROACH TO FUZZY QUANTIFIERS IN NATURAL LANGUAGES , 1983 .

[53]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[54]  Amos Tversky,et al.  Studies of similarity , 1978 .

[55]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Frank Chung-Hoon Rhee,et al.  An interval type-2 fuzzy K-nearest neighbor , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[57]  Chuen-Tsai Sun,et al.  Neuro-fuzzy modeling and control , 1995, Proc. IEEE.

[58]  Yannis A. Tolias,et al.  Generalized fuzzy indices for similarity matching , 2001, Fuzzy Sets Syst..

[59]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[60]  Subhagata Chattopadhyay,et al.  A neuro-fuzzy approach for the diagnosis of depression , 2017 .

[61]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[62]  Gang Wang,et al.  An efficient diagnosis system for detection of Parkinson's disease using fuzzy k-nearest neighbor approach , 2013, Expert Syst. Appl..

[63]  Doheon Lee,et al.  On cluster validity index for estimation of the optimal number of fuzzy clusters , 2004, Pattern Recognit..

[64]  Alain Abran,et al.  Fuzzy Analogy: A New Approach for Software Cost Estimation , 2001 .

[65]  Alain Abran,et al.  Towards a fuzzy logic based measures for software projects similarity , 2000 .

[66]  Peerapong Torteeka,et al.  Indoor positioning based on Wi-Fi Fingerprint Technique using Fuzzy K-Nearest Neighbor , 2014, Proceedings of 2014 11th International Bhurban Conference on Applied Sciences & Technology (IBCAST) Islamabad, Pakistan, 14th - 18th January, 2014.

[67]  Ali Zeinal Hamadani,et al.  Case-based reasoning for classification in the mixed data sets employing the compound distance methods , 2013, Eng. Appl. Artif. Intell..

[68]  Brigitte Charnomordic,et al.  Generating an interpretable family of fuzzy partitions from data , 2004, IEEE Transactions on Fuzzy Systems.

[69]  Francisco Herrera,et al.  Linguistic Fuzzy Rules in Data Mining: Follow-Up Mamdani Fuzzy Modeling Principle , 2012, Combining Experimentation and Theory.

[70]  R. Yager Quantifier guided aggregation using OWA operators , 1996, Int. J. Intell. Syst..

[71]  J. Bezdek,et al.  Generalized k -nearest neighbor rules , 1986 .

[72]  Francisco Herrera,et al.  Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers , 2014, Pattern Recognit..