Clustering with a Semantic Criterion Based on Dimensionality Analysis

Considering data processing problems from a geometric point of view, previous work has shown that the intrinsic dimension of the data could have some semantics. In this paper, we start from the consideration of this inherent topology property and propose the usage of such a semantic criterion for clustering. The corresponding learning algorithms are provided. Theoretical justification and analysis of the algorithms are shown. Promising results are reported by the experiments that generally fail with conventional clustering algorithms.

[1]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[2]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[7]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[8]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[10]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[14]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.