A three-step unsupervised neural model for visualizing high complex dimensional spectroscopic data sets

The interdisciplinary research presented in this study is based on a novel approach to clustering tasks and the visualization of the internal structure of high-dimensional data sets. Following normalization, a pre-processing step performs dimensionality reduction on a high-dimensional data set, using an unsupervised neural architecture known as cooperative maximum likelihood Hebbian learning (CMLHL), which is characterized by its capability to preserve a degree of global ordering in the data. Subsequently, the self organising-map (SOM) is applied, as a topology-preserving architecture used for two-dimensional visualization of the internal structure of such data sets. This research studies the joint performance of these two neural models and their capability to preserve some global ordering. Their effectiveness is demonstrated through a case of study on a real-life high complex dimensional spectroscopic data set characterized by its lack of reproducibility. The data under analysis are taken from an X-ray spectroscopic analysis of a rose window in a famous ancient Gothic Spanish cathedral. The main aim of this study is to classify each sample by its date and place of origin, so as to facilitate the restoration of these and other historical stained glass windows. Thus, having ascertained the sample’s chemical composition and degree of conservation, this technique contributes to identifying different areas and periods in which the stained glass panels were produced. The combined method proposed in this study is compared with a classical statistical model that uses principal component analysis (PCA) as a pre-processing step, and with some other unsupervised models such as maximum likelihood Hebbian learning (MLHL) and the application of the SOM without a pre-processing step. In the final case, a comparison of the convergence processes was performed to examine the efficacy of the CMLHL/SOM combined model.

[1]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[2]  Hsin-Chang Yang,et al.  A text mining approach on automatic generation of Web directories and hierarchies , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[3]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[4]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  Terrence J. Sejnowski,et al.  Constrained Optimization for Neural Map Formation: A Unifying Framework for Weight Growth and Normalization , 1998, Neural Computation.

[7]  Emilio Corchado,et al.  Connectionist Techniques For The Identification And Suppression Of Interfering Underlying Factors , 2003, Int. J. Pattern Recognit. Artif. Intell..

[8]  Hsin-Chang Yang,et al.  A text mining approach for automatic construction of hypertexts , 2005, Expert Syst. Appl..

[9]  Emilio Corchado,et al.  Fusion of Visualization Induced SOM , 2008, Innovations in Hybrid Intelligent Systems.

[10]  Colin Fyfe,et al.  epsilon-insensitive Hebbian learning , 2002, Neurocomputing.

[11]  Lei Liu,et al.  Boosting feature selection using information metric for classification , 2009, Neurocomputing.

[12]  Sigeru Omatu,et al.  A PCA based method for improving the reliability of bank note classifier machines , 2003, 3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the.

[13]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[14]  Álvaro Herrero,et al.  MOVIH-IDS: A mobile-visualization hybrid intrusion detection system , 2009, Neurocomputing.

[15]  Gulzar A. Khuwaja Merging face and finger images for human identification , 2005, Pattern Analysis and Applications.

[16]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[17]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[18]  Nasser Ghasem-Aghaee,et al.  A novel ACO-GA hybrid algorithm for feature selection in protein function prediction , 2009, Expert Syst. Appl..

[19]  Dennis Sanger,et al.  Contribution analysis: a technique for assigning responsibilities to hidden units in connectionist networks , 1991 .

[20]  Erkki Oja,et al.  Self-Organising Maps as a Relevance Feedback Technique in Content-Based Image Retrieval , 2001, Pattern Analysis & Applications.

[21]  Martin O. Leach,et al.  SOM-Based Wavelet Filtering for the Exploration of Medical Images , 2005, ICANN.

[22]  Hujun Yin,et al.  Data visualisation and manifold mapping using the ViSOM , 2002, Neural Networks.

[23]  Jennie Si,et al.  Weight-Value Convergence of the SOM Algorithm for Discrete Input , 1998, Neural Computation.

[24]  Emilio Corchado,et al.  Maximum and Minimum Likelihood Hebbian Learning for Exploratory Projection Pursuit , 2002, ICANN.

[25]  Álvaro Herrero,et al.  Neural projection techniques for the visual inspection of network traffic , 2009, Neurocomputing.

[26]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[27]  Mugdha Gadgil,et al.  Comparison of feature selection and classification combinations for cancer classification using microarray data , 2009, Int. J. Bioinform. Res. Appl..

[28]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[29]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[30]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[31]  Hsin-Chang Yang,et al.  Mining text documents for thematic hierarchies using self-organizing maps , 2003 .

[32]  Álvaro Herrero,et al.  DIPKIP: A CONNECTIONIST KNOWLEDGE MANAGEMENT SYSTEM TO IDENTIFY KNOWLEDGE DEFICITS IN PRACTICAL CASES , 2010, Comput. Intell..

[33]  Alberto Sanfeliu,et al.  Progress in Pattern Recognition, Speech and Image Analysis , 2003, Lecture Notes in Computer Science.

[34]  Bernadette Bouchon-Meunier,et al.  Uncertainty in Intelligent and Information Systems , 2000 .

[35]  Gerald Krell,et al.  Improving Still Image Coding by an SOM-Controlled Associative Memory , 2003, CIARP.

[36]  Yang Li,et al.  Intrusion Detection Based on Back-Propagation Neural Network and Feature Selection Mechanism , 2009, FGIT.

[37]  Juan M. Corchado,et al.  Evaluating the air-sea interactions and fluxes using an instance-based reasoning system , 2005, AI Commun..

[38]  Li Pheng Khoo,et al.  A Web-enabled product definition and customization system for product conceptualization , 2005, Expert Syst. J. Knowl. Eng..

[39]  Horst Bunke,et al.  Handwritten Word Recognition Using Classifier Ensembles Generated From Multiple Prototypes , 2004, Int. J. Pattern Recognit. Artif. Intell..

[40]  Tommy W. S. Chow,et al.  Content-based image retrieval by using tree-structured features and multi-layer self-organizing map , 2006, Pattern Analysis and Applications.

[41]  H. Sebastian Seung,et al.  The Rectified Gaussian Distribution , 1997, NIPS.

[42]  Emilio Corchado,et al.  Outlier Resistant PCA Ensembles , 2006, KES.

[43]  Emilio Corchado,et al.  A weighted voting summarization of SOM ensembles , 2010, Data Mining and Knowledge Discovery.

[44]  János Abonyi,et al.  Process analysis and product quality estimation by Self-Organizing Maps with an application to polyethylene production , 2003, Comput. Ind..

[45]  Emilio Corchado,et al.  Maximum and Minimum Likelihood Hebbian Learning for Exploratory Projection Pursuit , 2002, Data Mining and Knowledge Discovery.

[46]  Teuvo Kohonen Data Mining by the Self-Organizing Map Method , 2000 .

[47]  David Griol,et al.  A stochastic approach for dialog management based on neural networks , 2006, INTERSPEECH.

[48]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[49]  Horst Bunke,et al.  An evaluation of ensemble methods in handwritten word recognition based on feature selection , 2004, ICPR 2004.