Unsupervised learning of Dirichlet process mixture models with missing data

This study presents a novel approach to unsupervised learning for clustering with missing data. We first extend a finite mixture model to the infinite case by considering Dirichlet process mixtures, which can automatically determine the number of mixture components or clusters. Furthermore, we view the missing features as latent variables and compute the posterior distributions using the variational Bayesian expectation maximization algorithm, which optimizes the evidence lower bound on the complete-data log marginal likelihood. We demonstrate the performance on several artificial data sets with missing values. The experimental results indicate that the proposed method outperforms some classic imputation methods. We finally present an application to seabed hydrothermal sulfide color images analysis problem.创新点本文提出了一种能够用于处理缺失数据的无监督聚类学习方法。首先,我们将Dirichlet过程作为先验分布引入到有限混合模型中,实现聚类数目或混合成分数的自动识别。其次,针对观测样本不同维度数据存在缺失的问题,我们将缺失成分当成隐变量参数,利用变分贝叶斯期望最大化算法优化完全观测数据边际似然函数的下界,对参数的后验分布进行求解。通过和几种典型的插补方法进行对比实验,验证了本文所提出方法的有效性。最后,将该方法应用于深海热液硫化物图像分析,完成图像的自动分类任务。

[1]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[4]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[5]  Mark D. Hannington,et al.  Polymetallic massive sulfides at the modern seafloor A review , 1995 .

[6]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[7]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[8]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[9]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[10]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[11]  Terrence J. Sejnowski,et al.  Variational Learning of Clusters of Undercomplete Nonsymmetric Independent Components , 2003, J. Mach. Learn. Res..

[12]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[14]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[15]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[16]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[17]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[18]  Hsiu J. Ho,et al.  On fast supervised learning for normal mixture models with missing information , 2006, Pattern Recognit..

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[20]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[21]  Sanja Fidler,et al.  Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Joachim M. Buhmann,et al.  Nonparametric Bayesian Image Segmentation , 2008, International Journal of Computer Vision.

[24]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[25]  Lawrence Carin,et al.  On Classification with Incomplete Data , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[27]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[28]  David B. Dunson,et al.  Classification with Incomplete Data Using Dirichlet Process Priors , 2010, J. Mach. Learn. Res..

[29]  Chung-Min Wu,et al.  A trend based investment decision approach using clustering and heuristic algorithm , 2013, Science China Information Sciences.

[30]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[31]  Nizar Bouguila,et al.  Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection , 2013, Pattern Recognit..

[32]  Zongben Xu,et al.  Hierarchical clustering driven by cognitive features , 2013, Science China Information Sciences.