Clustering, dimensionality reduction, and side information

Recent advances in sensing and storage technology have created many high-volume, high-dimensional data sets in pattern recognition, machine learning, and data mining. Unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no well-defined notion of classes. The purpose of this thesis is to study some of the open problems in two main areas of unsupervised learning, namely clustering and (unsupervised) dimensionality reduction. Instance-level constraint on objects, an example of side-information, is also considered to improve the clustering results. Our first contribution is a modification to the isometric feature mapping (ISOMAP) algorithm when the input data, instead of being all available simultaneously, arrive sequentially from a data stream. ISOMAP is representative of a class of nonlinear dimensionality reduction algorithms that are based on the notion of a manifold. Both the standard ISOMAP and the landmark version of ISOMAP are considered. Experimental results on synthetic data as well as real world images demonstrate that the modified algorithm can maintain an accurate low-dimensional representation of the data in an efficient manner. We study the problem of feature selection in model-based clustering when the number of clusters is unknown. We propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm for its estimation. By using the minimum message length (MML) model selection criterion, the saliency of irrelevant features is driven towards zero, which corresponds to performing feature selection. The use of MML can also determine the number of clusters automatically by pruning away the weak clusters. The proposed algorithm is validated on both synthetic data and data sets from the UCI machine learning repository. We have also developed a new algorithm for incorporating instance-level constraints in model-based clustering. Its main idea is that we require the cluster label of an object to be determined only by its feature vector and the cluster parameters. In particular, the constraints should not have any direct influence. This consideration leads to a new objective function that considers both the fit to the data and the satisfaction of the constraints simultaneously. The line-search Newton algorithm is used to find the cluster parameter vector that optimizes this objective function. This approach is extended to simultaneously perform feature extraction and clustering under constraints. Comparison of the proposed algorithm with competitive algorithms over eighteen data sets from different domains, including text categorization, low-level image segmentation, appearance-based vision, and benchmark data sets from the UCI machine learning repository, shows the superiority of the proposed approach.

[1]  Naftali Tishby,et al.  The Power of Word Clusters for Text Classification , 2006 .

[2]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[3]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[4]  A. Elgammal,et al.  Separating style and content on a nonlinear manifold , 2004, CVPR 2004.

[5]  R. Fletcher Practical Methods of Optimization , 1988 .

[6]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[7]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[8]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[9]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[10]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[12]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[13]  Matti Pietikäinen,et al.  Unsupervised learning using locally linear embedding: experiments with face pose analysis , 2002, Object recognition supported by user interaction for service robots.

[14]  Stan Z. Li,et al.  Manifold Learning and Applications in Recognition , 2005 .

[15]  J. Rissanen Stochastic Complexity in Statistical Inquiry Theory , 1989 .

[16]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[17]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[18]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[19]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[20]  Paul S. Bradley,et al.  Clustering very large databases using EM mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  Daphna Weinshall,et al.  Enhancing image and video retrieval: learning via equivalence constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[23]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[24]  Avinash C. Kak,et al.  3-D Object Recognition Using Bipartite Matching Embedded in Discrete Relaxation , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  R. Tibshirani,et al.  Supervised harvesting of expression trees , 2001, Genome Biology.

[26]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[27]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[28]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[29]  Andrew W. Moore,et al.  Repairing Faulty Mixture Models using Density Estimation , 2001, ICML.

[30]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[31]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[32]  Yap-Peng Tan,et al.  Intelligent Multimedia Processing with Soft Computing , 2008 .

[33]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[34]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[35]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[36]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[37]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[38]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[39]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[40]  Anil K. Jain,et al.  Soft Biometric Traits for Personal Recognition Systems , 2004, ICBA.

[41]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[42]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[43]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[44]  Gautam Biswas,et al.  Evaluation of Projection Algorithms , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[46]  Anil K. Jain,et al.  Online handwritten script recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ming-Hsuan Yang,et al.  Face recognition using extended isomap , 2002, Proceedings. International Conference on Image Processing.

[48]  Stephen J. Roberts,et al.  Maximum certainty data partitioning , 2000, Pattern Recognit..

[49]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[50]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[51]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[52]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[53]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[54]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[55]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[56]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[57]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[58]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[59]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[60]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[61]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[62]  Ben J. A. Kröse,et al.  Coordinating Principal Component Analyzers , 2002, ICANN.

[63]  Forrest E. Clements,et al.  Use of Cluster Analysis with Anthropological Data , 1954 .

[64]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[65]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[66]  Kai-Yeung Siu,et al.  New dynamic algorithms for shortest path tree computation , 2000, TNET.

[67]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[69]  Anil K. Jain,et al.  Occupant classification system for automotive airbag suppression , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[70]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[71]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[72]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[73]  Pedro Larrañaga,et al.  Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  Nando de Freitas,et al.  Bayesian Feature Weighting for Unsupervised Learning, with Application to Object Recognition , 2003, AISTATS.

[75]  Dorin Comaniciu,et al.  An Algorithm for Data-Driven Bandwidth Selection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[79]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[80]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[81]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[82]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[83]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[84]  Changbo Hu,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[85]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[86]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[87]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[88]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[90]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[91]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[92]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[95]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[96]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[97]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[98]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[99]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[100]  Anil K. Jain,et al.  A Feature Selection Wrapper for Mixtures , 2003, IbPRIA.

[101]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[102]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[104]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[105]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[106]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[107]  Stan Z. Li,et al.  Nonlinear mapping from multi-view face patterns to a Gaussian distribution in a low dimensional space , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[108]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[109]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[110]  I. Vajda,et al.  A new class of metric divergences on probability spaces and its applicability in statistics , 2003 .

[111]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[112]  Juyang Weng,et al.  Hierarchical Discriminant Regression , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[114]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[115]  P. Arabie,et al.  Cluster analysis in marketing research , 1994 .

[116]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[117]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[118]  Shivakumar Vaithyanathan,et al.  Generalized Model Selection for Unsupervised Learning in High Dimensions , 1999, NIPS.

[119]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[120]  Stan Z. Li,et al.  Nearest manifold approach for face recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[121]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[122]  Anil K. Jain,et al.  Landscape of clustering algorithms , 2004, ICPR 2004.

[123]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[124]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[125]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[126]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[127]  Carl-Fredrik Westin,et al.  Coloring of DT-MRI Fiber Traces Using Laplacian Eigenmaps , 2003, EUROCAST.

[128]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[129]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[130]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[131]  Anil K. Jain,et al.  Model-based Clustering With Probabilistic Constraints , 2005, SDM.

[132]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  Anil K. Jain,et al.  Incremental nonlinear dimensionality reduction by manifold learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[135]  Ran El-Yaniv,et al.  Iterative Double Clustering for Unsupervised and Semi-supervised Learning , 2001, ECML.

[136]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[137]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[138]  D. Cox Note on Grouping , 1957 .

[139]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[140]  Bernhard Schölkopf,et al.  Learning to Find Pre-Images , 2003, NIPS.

[141]  D. DeCoste Visualizing Mercer Kernel feature spaces via kernelized locally-linear embeddings , 2001 .

[142]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[143]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[144]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[145]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[146]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[147]  Yann LeCun,et al.  Handwritten zip code recognition with multilayer networks , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[148]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[149]  Anil K. Jain,et al.  Ethnicity identification from face images , 2004, SPIE Defense + Commercial Sensing.

[150]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[151]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[152]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[153]  Joachim M. Buhmann,et al.  Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[154]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[155]  Olli Silven,et al.  Comparison of dimensionality reduction methods for wood surface inspection , 2003, International Conference on Quality Control by Artificial Vision.

[156]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[157]  Reiner Horst,et al.  Introduction to Global Optimization (Nonconvex Optimization and Its Applications) , 2002 .

[158]  E. Palmer Graphical evolution: an introduction to the theory of random graphs , 1985 .

[159]  David G. Stork,et al.  Pattern Classification , 1973 .

[160]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[161]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[162]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[163]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[164]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[165]  G. W. Hatfield,et al.  DNA microarrays and gene expression , 2002 .

[166]  Zhengdong Lu,et al.  Semi-supervised Learning with Penalized Probabilistic Clustering , 2004, NIPS.

[167]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[168]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[169]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[170]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[171]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[172]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[173]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[174]  Thomas Hofmann,et al.  Non-redundant data clustering , 2006, Knowledge and Information Systems.

[175]  M. J. van der Laan,et al.  Statistical inference for simultaneous clustering of gene expression data. , 2002, Mathematical biosciences.

[176]  Marcel J. T. Reinders,et al.  Local Fisher embedding , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[177]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[178]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[179]  Matthew Brand,et al.  Continuous nonlinear dimensionality reduction by kernel Eigenmaps , 2003, IJCAI.

[180]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[181]  Hongyuan Zha,et al.  Isometric Embedding and Continuum ISOMAP , 2003, ICML.

[182]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[183]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[184]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[185]  Marina Meila,et al.  A Comparison of Spectral Clustering Algorithms , 2003 .

[186]  Kilian Q. Weinberger,et al.  Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization , 2005, AISTATS.

[187]  R. Tibshirani Principal curves revisited , 1992 .

[188]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[189]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[190]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[191]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[192]  Kam D. Dahlquist,et al.  Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..

[193]  Alfred O. Hero,et al.  Manifold learning using Euclidean k-nearest neighbor graphs [image processing examples] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[194]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[195]  Leonard R. Sussman,et al.  Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[196]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[197]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[198]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[199]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[200]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[201]  Anil K. Jain,et al.  Artificial neural network for nonlinear projection of multivariate data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[202]  David J. Miller,et al.  Mixture Modeling with Pairwise, Instance-Level Class Constraints , 2005, Neural Computation.

[203]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[204]  Anil K. Jain,et al.  Clustering with Soft and Group Constraints , 2004, SSPR/SPR.

[205]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[206]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[207]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[208]  Anil K. Jain,et al.  Feature definition in pattern recognition with small sample size , 1978, Pattern Recognit..

[209]  H. Mannila,et al.  Subspace Clustering of Binary Data - A Probabilistic Approach , 2004 .

[210]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[211]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[212]  R. C. Williamson,et al.  Regularized principal manifolds , 2001 .

[213]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[214]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[215]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[216]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[217]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[218]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[219]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[220]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[221]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[222]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[223]  Gerald Sommer,et al.  Intrinsic Dimensionality Estimation With Optimally Topology Preserving Maps , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[224]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[225]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[226]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[227]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[228]  I. Hassan Embedded , 2005, The Cyber Security Handbook.

[229]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[230]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[231]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[232]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[233]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[234]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[235]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[236]  J. Carroll,et al.  A Feature-Based Approach to Market Segmentation via Overlapping K-Centroids Clustering , 1997 .

[237]  Charles T. Zahn,et al.  and Describing GestaltClusters , 1971 .

[238]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[239]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[240]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[241]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[242]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[243]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[244]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[245]  Anil K. Jain,et al.  Nonlinear Manifold Learning for Data Stream , 2004, SDM.

[246]  Anil K. Jain,et al.  Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[247]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[248]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[249]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding , 2003, ICANN.

[250]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[251]  Joachim M. Buhmann,et al.  Unsupervised Texture Segmentation in a Deterministic Annealing Framework , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[252]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[253]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[254]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[255]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[256]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[257]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[258]  T. Hastie,et al.  Principal Curves , 2007 .

[259]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[260]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[261]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[262]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[263]  Alan J. Miller Subset Selection in Regression , 1992 .

[264]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[265]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[266]  Gene H. Golub,et al.  Matrix computations , 1983 .

[267]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[268]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[269]  Daphne Koller,et al.  Using machine learning to improve information access , 1998 .

[270]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[271]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[272]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[273]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[274]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[275]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[276]  Luis Talavera,et al.  Dependency-based feature selection for clustering symbolic data , 2000, Intell. Data Anal..