Producing good low-dimensional representations of high-dimensional data is a common and important task in many data mining applications. Two methods that have been particularly useful in this regard are multidimensional scaling and nonlinear mapping. These methods attempt to visualize a set of objects described by means of a dissimilarity or distance matrix on a low-dimensional display plane in a way that preserves the proximities of the objects to whatever extent is possible. Unfortunately, most known algorithms are of quadratic order, and their use has been limited to relatively small data sets. We recently demonstrated that nonlinear maps derived from a small random sample of a large data set exhibit the same structure and characteristics as that of the entire collection, and that this structure can be easily extracted by a neural network, making possible the scaling of data set orders of magnitude larger than those accessible with conventional methodologies. Here, we present a variant of this algorithm based on local learning. The method employs a fuzzy clustering methodology to partition the data space into a set of Voronoi polyhedra, and uses a separate neural network to perform the nonlinear mapping within each cell. We find that this local approach offers a number of advantages, and produces maps that are virtually indistinguishable from those derived with conventional algorithms. These advantages are discussed using examples from the fields of combinatorial chemistry and optical character recognition. c © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 373–386, 2001
[1]
D. Signorini,et al.
Neural networks
,
1995,
The Lancet.
[2]
S. Hyakin,et al.
Neural Networks: A Comprehensive Foundation
,
1994
.
[3]
G. Schneider,et al.
Virtual Screening for Bioactive Molecules
,
2000
.
[4]
P. Kollman,et al.
Encyclopedia of computational chemistry
,
1998
.
[5]
James C. Bezdek,et al.
Pattern Recognition with Fuzzy Objective Function Algorithms
,
1981,
Advanced Applications in Pattern Recognition.
[6]
Erkki Oja,et al.
Subspace methods of pattern recognition
,
1983
.
[7]
Catherine Blake,et al.
UCI Repository of machine learning databases
,
1998
.