Privacy preserving sub-feature selection based on fuzzy probabilities

The feature selection addresses the issue of developing accurate models for classification in data mining. The aggregated data collection from distributed environment for feature selection makes the problem of accessing the relevant inputs of individual data records. Preserving the privacy of individual data is often critical issue in distributed data mining. In this paper, it proposes the privacy preservation of individual data for both feature and sub-feature selection based on data mining techniques and fuzzy probabilities. For privacy purpose, each party maintains their privacy as the instruction of data miner with the help of fuzzy probabilities as alias values. The techniques have developed for own database of data miner in distributed network with fuzzy system and also evaluation of sub-feature value included for the processing of data mining task. The feature selection has been explained by existing data mining techniques i.e., gain ratio using fuzzy optimization. The estimation of gain ratio based on the relevant inputs for the feature selection has been evaluated within the expected upper and lower bound of fuzzy data set. It mainly focuses on sub-feature selection with privacy algorithm using fuzzy random variables among different parties in distributed environment. The sub-feature selection is uniquely identified for better class prediction. The algorithm provides the idea of selecting sub-feature using fuzzy probabilities with fuzzy frequency data from data miner’s database. The experimental result shows performance of our findings based on real world data set.

[1]  Hao-Jun Sun,et al.  Feature Selection Via Fuzzy Clustering , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[2]  María José del Jesús,et al.  Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms , 2004, IEEE Transactions on Fuzzy Systems.

[3]  J. Watada,et al.  Fuzzy Random Variable , 2012 .

[4]  Sanjit Kumar Dash,et al.  An Approach for Privacy Preservation of Distributed Data in Peer-to-Peer Network using Multiparty Computation , 2011 .

[5]  Hemanta Kumar Bhuyan,et al.  Sub-feature Selection with Privacy in Decentralized Network based on Fuzzy Environment , 2013 .

[6]  Inés Couso,et al.  Higher order models for fuzzy random variables , 2008, Fuzzy Sets Syst..

[7]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[8]  Qiang Shen,et al.  Fuzzy-Rough Sets Assisted Attribute Selection , 2007, IEEE Transactions on Fuzzy Systems.

[9]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[10]  Bao Qing Hu,et al.  Feature Selection using Fuzzy Support Vector Machines , 2006, Fuzzy Optim. Decis. Mak..

[11]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[12]  Jorge Casillas,et al.  Modeling Vague Data with Genetic Fuzzy Systems under a Combination of Crisp and Imprecise Criteria , 2007, 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making.

[13]  Hamid Reza Pourreza,et al.  Efficient IRIS Recognition through Improvement of Feature Extraction and subset Selection , 2009, ArXiv.

[14]  K. Sivakumar,et al.  Collective mining of Bayesian networks from distributed heterogeneous data , 2003, Knowledge and Information Systems.

[15]  Peter Funk,et al.  Construction of fuzzy knowledge bases incorporating feature selection , 2006, Soft Comput..

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[17]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[18]  M. Puri,et al.  Fuzzy Random Variables , 1986 .

[19]  Luciano Sánchez,et al.  Learning Fuzzy Linguistic Models from Low Quality Data by Genetic Algorithms , 2007, 2007 IEEE International Fuzzy Systems Conference.

[20]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[21]  Özge Uncu,et al.  A novel feature selection approach: Combining feature wrappers and filters , 2007, Inf. Sci..

[22]  Huseyin Polat,et al.  Privacy-preserving SOM-based recommendations on horizontally distributed data , 2012, Knowl. Based Syst..

[23]  Murat Kantarcioglu,et al.  Incentive Compatible Privacy-Preserving Distributed Classification , 2012, IEEE Transactions on Dependable and Secure Computing.

[24]  Minghua Chen,et al.  Enabling Multilevel Trust in Privacy Preserving Data Mining , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25]  Boudewijn P. F. Lelieveldt,et al.  Fuzzy feature selection , 1999, Pattern Recognit..

[26]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Kun Liu,et al.  Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework , 2007, PKDD.

[28]  Inés Couso,et al.  Upper and lower probabilities induced by a fuzzy random variable , 2011, Fuzzy Sets Syst..

[29]  Inés Couso,et al.  Some Results about Mutual Information-based Feature Selection and Fuzzy Discretization of Vague Data , 2007, 2007 IEEE International Fuzzy Systems Conference.

[30]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[31]  Kun Liu,et al.  Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network , 2008, IEEE Transactions on Knowledge and Data Engineering.

[32]  Claudia Diamantini,et al.  Feature Ranking Based on Decision Border , 2010, 2010 20th International Conference on Pattern Recognition.

[33]  Amr M. Youssef,et al.  Mining criminal networks from unstructured text documents , 2012, Digit. Investig..

[34]  Hillol Kargupta,et al.  A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks , 2009, Knowledge and Information Systems.

[35]  Ran Wolff,et al.  Distributed Decision‐Tree Induction in Peer‐to‐Peer Systems , 2008, Stat. Anal. Data Min..

[36]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[37]  Jian Pei,et al.  The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks , 2011, Knowledge and Information Systems.

[38]  HU Wei-li Design of High-dimensional Fuzzy Classification Systems Based on Multi-objective Evolutionary Algorithm , 2007 .

[39]  Huibert Kwakernaak,et al.  Fuzzy random variables - I. definitions and theorems , 1978, Inf. Sci..

[40]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[41]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .