Attribute dependency data analysis for massive datasets by fuzzy transforms

We present a numerical attribute dependency method for massive datasets based on the concepts of direct and inverse fuzzy transform. In a previous work, we used these concepts for numerical attribute dependency in data analysis: Therein, the multi-dimensional inverse fuzzy transform was useful for approximating a regression function. Here we give an extension of this method in massive datasets because the previous method could not be applied due to the high memory size. Our method is proved on a large dataset formed from 402,678 census sections of the Italian regions provided by the Italian National Statistical Institute (ISTAT) in 2011. The results of comparative tests with the well-known methods of regression, called support vector regression and multilayer perceptron, show that the proposed algorithm has comparable performance with those obtained using these two methods. Moreover, the number of parameters requested in our method is minor with respect to those of the cited in the above two algorithms.

[1]  Marie-Christine Suhner,et al.  A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications , 2014, Neural Processing Letters.

[2]  Salvatore Sessa,et al.  An image coding/decoding method based on direct and inverse fuzzy transforms , 2008, Int. J. Approx. Reason..

[3]  H. Tanka Fuzzy data analysis by possibilistic linear models , 1987 .

[4]  Salvatore Sessa,et al.  Fuzzy transforms method in prediction data analysis , 2011, Fuzzy Sets Syst..

[5]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[9]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[10]  Salvatore Sessa,et al.  A segmentation method for images compressed by fuzzy transforms , 2010, Fuzzy Sets Syst..

[11]  Shen Furao,et al.  An online incremental learning support vector machine for large-scale data , 2011, Neural Computing and Applications.

[12]  Xiaoqian Jiang,et al.  Supplementary Issue: Computational Advances in Cancer Informatics (a) , 2022 .

[13]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[14]  Enrico Blanzieri,et al.  Fast Local Support Vector Machines for Large Datasets , 2009, MLDM.

[15]  Salvatore Sessa,et al.  Fragile watermarking tamper detection with images compressed by fuzzy transform , 2012, Inf. Sci..

[16]  Sunghae Jun,et al.  A Divided Regression Analysis for Big Data , 2015 .

[17]  Miroslav Hudec,et al.  A new method for computing fuzzy functional dependencies in relational database systems , 2013, Expert Syst. Appl..

[18]  Irina Perfilieva,et al.  Fuzzy transforms: Theory and applications , 2006, Fuzzy Sets Syst..

[19]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[20]  Haoruo Peng,et al.  Evaluating parallel logistic regression models , 2013, 2013 IEEE International Conference on Big Data.

[21]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[22]  D. Anguita,et al.  K-fold generalization capability assessment for support vector classifiers , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  Yue-Shi Lee,et al.  Classification Based on Attribute Dependency , 2004, DaWaK.

[24]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[25]  Vilém Novák,et al.  Fuzzy transform in the analysis of data , 2008, Int. J. Approx. Reason..

[26]  Yue-Shi Lee,et al.  A neural network approach to discover attribute dependency for improving the performance of classification , 2011, Expert Syst. Appl..

[27]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[28]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[29]  Zhiqiang Ge,et al.  Distributed parallel deep learning of Hierarchical Extreme Learning Machine for multimode quality prediction with big process data , 2019, Eng. Appl. Artif. Intell..

[30]  Salvatore Sessa,et al.  Fuzzy transforms method and attribute dependency in data analysis , 2010, Inf. Sci..

[31]  Fionn Murtagh,et al.  Multilayer perceptrons for classification and regression , 1991, Neurocomputing.

[32]  S. Wood,et al.  Generalized additive models for large data sets , 2015 .

[33]  Salvatore Sessa,et al.  Fuzzy transforms for compression and decompression of color videos , 2010, Inf. Sci..

[34]  Jared Dean,et al.  Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners , 2014 .

[35]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[36]  Salvatore Sessa,et al.  Compression and decompression of images with discrete fuzzy transforms , 2007, Inf. Sci..

[37]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[38]  Rong Jin,et al.  Efficient Algorithm for Localized Support Vector Machine , 2010, IEEE Transactions on Knowledge and Data Engineering.