An Algorithm for Sample and Data Dimensionality Reduction Using Fast Simulated Annealing

This paper deals with dimensionality and sample length reduction applied to the tasks of exploratory data analysis. Proposed technique relies on distance preserving linear transformation of given dataset to the lower dimensionality feature space. Coefficients of feature transformation matrix are found using Fast Simulated Annealing - an algorithm inspired by physical annealing of solids. Furthermore the elimination or weighting of data elements which, as an effect of above mentioned transformation, were moved significantly from the rest of the dataset can be performed. Presented method was positively verified in routines of clustering, classification and outlier detection. It ensures proper efficiency of those procedures in compact feature space and with reduced data sample length at the same time.

[1]  T. Crainic,et al.  Parallel Meta-Heuristics , 2010 .

[2]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[3]  Griffin Caprio,et al.  Parallel Metaheuristics , 2008, IEEE Distributed Systems Online.

[4]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[5]  Dongkyung Nam,et al.  n-Dimensional Cauchy Neighbor Generation for the Fast Simulated Annealing , 2004, IEICE Trans. Inf. Syst..

[6]  L.J.P. van der Maaten Feature extraction from visual data , 2009 .

[7]  Bhanu Prasad Soft Computing Applications in Industry , 2008, Soft Computing Applications in Industry.

[8]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[9]  Piotr Kulczycki,et al.  Kernel Estimators in Industrial Applications , 2008, Soft Computing Applications in Industry.

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Sankar K. Pal,et al.  Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery, and Soft Granular Computing , 2004 .

[12]  Szymon Łukasik,et al.  Parallel Computing of Kernel Density Estimates with MPI , 2007 .

[13]  D. M. Hutton,et al.  Pattern Recognition Algorithms for Data Mining , 2005 .

[14]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  John Domingue,et al.  Artificial Intelligence: Methodology, Systems, and Applications, 12th International Conference, AIMSA 2006, Varna, Bulgaria, September 12-15, 2006, Proceedings , 2006, AIMSA.

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  H. Szu Fast simulated annealing , 1987 .

[18]  Jack Dongarra,et al.  Computational Science - ICCS 2007, 7th International Conference, Beijing, China, May 27 - 30, 2007, Proceedings, Part III , 2007, ICCS.

[19]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[20]  Enrique Alba,et al.  Parallel Metaheuristics: A New Class of Algorithms , 2005 .

[21]  Statistical Inferences for Termination of Markov Type Random Search Algorithms , 2009 .

[22]  Piotr A. Kowalski,et al.  Data Sample Reduction for Classification of Interval Information Using Neural Network Sensitivity Analysis , 2010, AIMSA.

[23]  Szymon Lukasik,et al.  Parallel Computing of Kernel Density Estimates with MPI , 2007, International Conference on Computational Science.

[24]  N. Pal,et al.  Evolutionary methods for unsupervised feature selection using Sammon’s stress function , 2010 .

[25]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Fausto Giunchiglia Artificial Intelligence: Methodology, Systems, and Applications , 1998, Lecture Notes in Computer Science.

[27]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[28]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[29]  Lester Ingber,et al.  Adaptive simulated annealing (ASA): Lessons learned , 2000, ArXiv.

[30]  Sadiq M. Sait,et al.  Iterative computer algorithms with applications in engineering - solving combinatorial optimization problems , 2000 .

[31]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[32]  Alberto Prieto,et al.  Computational intelligence and bioinspired systems , 2007, Neurocomputing.

[33]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.