Misleading classification

In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.

[1]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[2]  Philip S. Yu,et al.  Early classification on time series , 2012, Knowledge and Information Systems.

[3]  He Jiang,et al.  Towards Training Set Reduction for Bug Triage , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[4]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Peng Zhang,et al.  Privacy Preserving Naive Bayes Classification , 2005, ADMA.

[6]  Nicholas Kalouptsidis,et al.  Nearest neighbor pattern classification neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[8]  Yuan Shi,et al.  Transferred Feature Selection , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[9]  Chang-Dong Wang,et al.  Conscience online learning: an efficient approach for robust kernel-based clustering , 2011, Knowledge and Information Systems.

[10]  Charu C. Aggarwal,et al.  On the Inverse Classification Problem and its Applications , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[12]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[15]  Xindong Wu,et al.  Extracting elite pairwise constraints for clustering , 2013, Neurocomputing.