Density-Induced Support Vector Data Description

The purpose of data description is to give a compact description of the target data that represents most of its characteristics. In a support vector data description (SVDD), the compact description of target data is given in a hyperspherical model, which is determined by a small portion of data called support vectors. Despite the usefulness of the conventional SVDD, however, it may not identify the optimal solution of target description especially when the support vectors do not have the overall characteristics of the target data. To address the issue in SVDD methodology, we propose a new SVDD by introducing new distance measurements based on the notion of a relative density degree for each data point in order to reflect the distribution of a given data set. Moreover, for a real application, we extend the proposed method for the protein localization prediction problem which is a multiclass and multilabel problem. Experiments with various real data sets show promising results

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[3]  M. J. D. Powell,et al.  A TOLERANT ALGORITHM FOR LINEARLY CONSTRAINED , 1989 .

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[6]  Don R. Hush,et al.  Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[7]  Emilio Parrado-Hernández,et al.  Distributed support vector machines , 2006, IEEE Trans. Neural Networks.

[8]  Gunter Ritter,et al.  Outliers in statistical pattern recognition and an application to automatic chromosome classification , 1997, Pattern Recognit. Lett..

[9]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[10]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[11]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[12]  Kazushi Ikeda,et al.  Effects of kernel function on Nu support vector machines in extreme cases , 2006, IEEE Transactions on Neural Networks.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[15]  M. J. D. Powell,et al.  A fast algorithm for nonlinearly constrained optimization calculations , 1978 .

[16]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[17]  Nikolas P. Galatsanos,et al.  A support vector machine approach for detection of microcalcifications , 2002, IEEE Transactions on Medical Imaging.

[18]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[19]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[20]  B. John Oommen,et al.  On utilizing search methods to select subspace dimensions for kernel-based nonlinear subspace classifiers , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..