Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning, while still obtaining good explanatory models for the original database.

[1]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[2]  Pedro Larrañaga,et al.  Performance evaluation of compromise conditional Gaussian networks for data clustering , 2001, Int. J. Approx. Reason..

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[5]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[6]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[7]  Luis Talavera,et al.  Dependency-based feature selection for clustering symbolic data , 2000, Intell. Data Anal..

[8]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[9]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[10]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[11]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[14]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[15]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[16]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[17]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[18]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[19]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[20]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[21]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[22]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[23]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[24]  Michael I. Jordan Graphical Models , 2003 .

[25]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[26]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[27]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[28]  Michael I. Jordan,et al.  Estimating Dependency Structure as a Hidden Variable , 1997, NIPS.

[29]  David Heckerman Likelihoods and Parameter Priors for Bayesian Networks , 1995 .

[30]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[31]  Joe Whittaker,et al.  Edge Exclusion Tests for Graphical Gaussian Models , 1999, Learning in Graphical Models.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[34]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[35]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[36]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[37]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[38]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[39]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.