Boosted K-nearest neighbor classifiers based on fuzzy granules

Abstract K-nearest neighbor (KNN) is a classic classifier, which is simple and effective. Adaboost is a combination of several weak classifiers as a strong classifier to improve the classification effect. These two classifiers have been widely used in the field of machine learning. In this paper, based on information fuzzy granulation, KNN and Adaboost, we propose two algorithms, a fuzzy granule K-nearest neighbor (FGKNN) and a boosted fuzzy granule K-nearest neighbor (BFGKNN), for classification. By introducing granular computing, we normalize the process of solving problem as a structured and hierarchical process. Structured information processing is focused, so the performance including accuracy and robust can be enhanced to data classification. First, a fuzzy set is introduced, and an atom attribute fuzzy granulation is performed on samples in the classified system to form fuzzy granules. Then, a fuzzy granule vector is created by multiple attribute fuzzy granules. We design the operators and define the measure of fuzzy granule vectors in the fuzzy granule space. And we also prove the monotonic principle of the distance of fuzzy granule vectors. Furthermore, we also give the definition of the concept of K-nearest neighbor fuzzy granule vector and present FGKNN algorithm and BFGKNN algorithm. Finally, we compare the performance among KNN, Back Propagation Neural Network (BPNN), Support Vector Machine (SVM), Logistic Regression (LR), FGKNN and BFGKNN on UCI data sets. Theoretical analysis and experimental results show that FGKNN and BFGKNN have better performance than that of the methods mentioned above if the appropriate parameters are given.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  Miao Duoqian,et al.  Set-Theoretic Formulation of Granular Computing , 2012 .

[4]  Yiyu Yao,et al.  Set-theoretic Approaches to Granular Computing , 2012, Fundam. Informaticae.

[5]  Xiaodong Liu,et al.  A rapid fuzzy rule clustering method based on granular computing , 2014, Appl. Soft Comput..

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[8]  Qi Li,et al.  Cross-Validation and Non-Parametric K Nearest-Neighbour Estimation , 2006 .

[9]  Gautam Bhattacharya,et al.  Granger Causality Driven AHP for Feature Weighted kNN , 2017, Pattern Recognit..

[10]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[11]  Milad Arabloo,et al.  Modeling of CO2 solubility in MEA, DEA, TEA, and MDEA aqueous solutions using AdaBoost-Decision Tree and Artificial Neural Network , 2017 .

[12]  Zhiwei Wang,et al.  A binary granular algorithm for spatiotemporal meteorological data mining , 2015, 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM).

[13]  Jun Zhao,et al.  Construction of prediction intervals for gas flow systems in steel industry based on granular computing , 2018, Control Engineering Practice.

[14]  Tsau Young Lin,et al.  Special issue on granular computing and data mining , 2004, Int. J. Intell. Syst..

[15]  Pritpal Singh,et al.  A hybrid fuzzy time series forecasting model based on granular computing and bio-inspired optimization approaches , 2018, J. Comput. Sci..

[16]  Tsau Young Lin,et al.  Data Mining: Granular Computing Approach , 1999, PAKDD.

[17]  Francisco Herrera,et al.  kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[18]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[19]  Wen-Chang Cheng,et al.  A self-constructing cascade classifier with AdaBoost and SVM for pedestriandetection , 2013, Eng. Appl. Artif. Intell..

[20]  Juan Ramón Rico-Juan,et al.  Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation , 2018, Pattern Recognit..

[21]  Ping Zhu,et al.  Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space , 2013, IEEE Transactions on Fuzzy Systems.

[22]  Yuhua Qian,et al.  Concept learning via granular computing: A cognitive viewpoint , 2014, Information Sciences.

[23]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[24]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[25]  Lotfi A. Zadeh,et al.  A New Direction in AI: Toward a Computational Theory of Perceptions , 2001, AI Mag..

[26]  Jess Martnez-Frutos,et al.  Efficient topology optimization using GPU computing with multilevel granularity , 2017 .

[27]  Mamun Bin Ibne Reaz,et al.  A novel SVM-kNN-PSO ensemble method for intrusion detection system , 2016, Appl. Soft Comput..

[28]  Qian Wang,et al.  An application of fuzzy hypergraphs and hypergraphs in granular computing , 2018, Inf. Sci..

[29]  Lotfi A. Zadeh,et al.  Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems , 1998, Soft Comput..

[30]  Davide Ciucci,et al.  Simple graphs in granular computing , 2016, Inf. Sci..

[31]  A. V. Savchenko,et al.  Fast multi-class recognition of piecewise regular objects based on sequential three-way decisions and granular computing , 2016, Knowl. Based Syst..

[32]  Wencong Lu,et al.  Predicting toxic action mechanisms of phenols using AdaBoost Learner , 2009 .

[33]  Wei Cheng,et al.  EEG classification for motor imagery and resting state in BCI applications using multi-class Adaboost extreme learning machine. , 2016, The Review of scientific instruments.

[34]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[35]  Ali Azadeh,et al.  A granular computing-based approach to credit scoring modeling , 2013, Neurocomputing.

[36]  Guoqiang Cai,et al.  EMD and GNN-AdaBoost fault diagnosis for urban rail train rolling bearings , 2019, Discrete & Continuous Dynamical Systems - S.

[37]  Wang Guo,et al.  Granular Computing Models for Knowledge Uncertainty , 2011 .

[38]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[39]  Yang Liu,et al.  An Adaptive Large Margin Nearest Neighbor Classification Algorithm , 2013 .

[40]  Guoyin Wang,et al.  Monotonic uncertainty measures for attribute reduction in probabilistic rough set model , 2015, Int. J. Approx. Reason..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[43]  Zhongzhi Shi,et al.  Perception granular computing in visual haze-free task , 2014, Expert Syst. Appl..

[44]  Michael F. McNitt-Gray,et al.  Automated classification of lung bronchovascular anatomy in CT using AdaBoost , 2007, Medical Image Anal..

[45]  Patrik Kamencay,et al.  A Novel Approach to Face Recognition using Image Segmentation based on SPCA-KNN Method , 2013 .

[46]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[47]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[48]  Ping-Feng Pai,et al.  Using ADABOOST and Rough Set Theory for Predicting Debris Flow Disaster , 2014, Water Resources Management.

[49]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[50]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[51]  Fu Zhongliang Effectiveness Analysis of AdaBoost , 2008 .

[52]  Witold Pedrycz,et al.  Granular computing for data analytics: a manifesto of human-centric computing , 2018, IEEE/CAA Journal of Automatica Sinica.

[53]  Yumin Chen,et al.  Finding rough set reducts with fish swarm algorithm , 2015, Knowl. Based Syst..

[54]  José Luis Alba-Castro,et al.  Shedding light on the asymmetric learning capability of AdaBoost , 2012, Pattern Recognit. Lett..

[55]  Giampiero Chiaselotti,et al.  Knowledge pairing systems in granular computing , 2017, Knowl. Based Syst..

[56]  Madhubanti Maitra,et al.  Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique , 2015, Expert Syst. Appl..

[57]  Sankar Kumar Pal,et al.  Granular Flow Graph, Adaptive Rule Generation and Tracking , 2017, IEEE Transactions on Cybernetics.

[58]  Y. Yao Information granulation and rough set approximation , 2001 .

[59]  Giampiero Chiaselotti,et al.  Granular computing on information tables: Families of subsets and operators , 2018, Inf. Sci..

[60]  Lotfi A. Zadeh,et al.  Fuzzy sets and information granularity , 1996 .