V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

Real data often appear in the form of multiple incomplete views, and incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (VH). Inspired by the variation and the heredity in genetics, VH first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, VH integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, VH recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, VH presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts. Impact Statement—Incomplete multi-view clustering is a popular technology to cluster incomplete datasets from multiple sources. Due to exempting the expensive requirement of labeling these datasets, this technology becomes more and more significant. However, the previous algorithms only perform well on some specific datasets, because they cannot fully learn the information of each view. By introducing the variation and the heredity in genetics, our proposed algorithm VH fully learns the information of each view. Compared with the state-of-theart algorithms, VH improves clustering performance by more than 20% in representative cases. With the large improvement on multiple datasets, VH has wide potential applications including detecting coronavirus disease 2019 (COVID-19), processing the financial data, and analyzing election data.

[1]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[2]  Xuelong Li,et al.  Multi-view Subspace Clustering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Philip S. Yu,et al.  Multiple Incomplete Views Clustering via Weighted Nonnegative Matrix Factorization with L2, 1 Regularization , 2015, ECML/PKDD.

[4]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[6]  Hong Liu,et al.  Unified Embedding Alignment with Missing Views Inferring for Incomplete Multi-View Clustering , 2019, AAAI.

[7]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Shiliang Sun,et al.  Multi-kernel maximum entropy discrimination for multi-view learning , 2016, Intell. Data Anal..

[9]  Shiliang Sun,et al.  Consensus and complementarity based maximum entropy discrimination for multi-view classification , 2016, Inf. Sci..

[10]  Jiawei Han,et al.  Sparse Projections over Graph , 2008, AAAI.

[11]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[12]  Han Zhang,et al.  Multiview Clustering: A Scalable and Parameter-Free Bipartite Graph Fusion Method , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Haibo He,et al.  A Ranked Subspace Learning Method for Gene Expression Data Classification , 2007, IC-AI.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Xinwang Liu,et al.  Multiple Kernel Clustering With Neighbor-Kernel Subspace Segmentation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Jay L. Lush,et al.  The genetics of populations , 1948 .

[17]  Shiliang Sun,et al.  Multi-View Maximum Entropy Discrimination , 2013, IJCAI.

[18]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[20]  R. Lu,et al.  Detection of SARS-CoV-2 in Different Types of Clinical Specimens. , 2020, JAMA.

[21]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[23]  Yan Bai,et al.  Presumed Asymptomatic Carrier Transmission of COVID-19. , 2020, JAMA.

[24]  Cui Chuan-zhi Notice of RetractionGenetic algorithm principle and the application in oilfield development , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[25]  Ira Vaughan Hiscock,et al.  Genetics of the Evolutionary Process , 1971, The Yale Journal of Biology and Medicine.

[26]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[27]  B. Weir,et al.  Analysis of cytoplasmic and maternal effects I. A genetic model for diploid plant seeds and animals , 1994, Theoretical and Applied Genetics.

[28]  Guoqing Chao,et al.  Discriminative K-Means Laplacian Clustering , 2018, Neural Processing Letters.

[29]  Yuan Luo,et al.  Recent Advances in Supervised Dimension Reduction: A Survey , 2019, Mach. Learn. Knowl. Extr..

[30]  Hong Peng,et al.  Enhancing multi-view clustering through common subspace integration by considering both global similarities and local structures , 2020, Neurocomputing.

[31]  R. Willham THE COVARIANCE BETWEEN RELATIVES FOR CHARACTERS COMPOSED OF COMPONENTS CONTRIBUTED BY RELATED INDIVIDUALS1 , 1963 .

[32]  Zhao Kang,et al.  Robust PCA Via Nonconvex Rank Approximation , 2015, 2015 IEEE International Conference on Data Mining.

[33]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[34]  Zhenni Li,et al.  Uniform Distribution Non-Negative Matrix Factorization for Multiview Clustering , 2020, IEEE Transactions on Cybernetics.

[35]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[36]  H. Rothan,et al.  The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak , 2020, Journal of Autoimmunity.

[37]  Hong Liu,et al.  Incomplete Multiview Spectral Clustering With Adaptive Graph Learning , 2020, IEEE Transactions on Cybernetics.

[38]  Shiliang Sun,et al.  A Survey on Multiview Clustering , 2017, IEEE Transactions on Artificial Intelligence.

[39]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[40]  Yun Fu,et al.  Incomplete Multi-Modal Visual Data Grouping , 2016, IJCAI.

[41]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[44]  Philip S. Yu,et al.  Online multi-view clustering with incomplete views , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[45]  Feiping Nie,et al.  Multi-View K-Means Clustering With Adaptive Sparse Memberships and Weight Allocation , 2022, IEEE Transactions on Knowledge and Data Engineering.

[46]  Xin Zheng,et al.  Partial Multi-view Subspace Clustering , 2018, ACM Multimedia.

[47]  E. Mayr Populations, Species, and Evolution, An Abridgment of Animal Species and Evolution , 1970 .

[48]  Songcan Chen,et al.  Doubly Aligned Incomplete Multi-view Clustering , 2018, IJCAI.

[49]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[50]  Shiliang Sun,et al.  Alternative Multiview Maximum Entropy Discrimination , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Jinbo Bi,et al.  Multi-view cluster analysis with incomplete data to understand treatment effects , 2019, Inf. Sci..

[52]  Shiliang Sun,et al.  Semi-supervised multi-view maximum entropy discrimination with expectation Laplacian regularization , 2019, Inf. Fusion.

[53]  G. Dahlberg,et al.  Genetics of human populations. , 1948, Advances in genetics.

[54]  G. Remuzzi,et al.  COVID-19 and Italy: what next? , 2020, The Lancet.

[55]  Majid Mirmehdi,et al.  Experiments on High Resolution Images Towards Outdoor Scene Classification , 2002 .

[56]  W. G. Hill,et al.  Heritability in the genomics era — concepts and misconceptions , 2008, Nature Reviews Genetics.

[57]  Qiang Zhou,et al.  Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 , 2020, Science.

[58]  Yuan Zhao,et al.  Supervised Nonnegative Matrix Factorization to Predict ICU Mortality Risk , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[59]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[60]  Chang-Dong Wang,et al.  Multi-View Clustering in Latent Embedding Space , 2020, AAAI.

[61]  Min Kang,et al.  SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients , 2020, The New England journal of medicine.

[62]  Chang Tang,et al.  Efficient and Effective Regularized Incomplete Multi-View Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Stan Z. Li,et al.  Exclusivity-Consistency Regularized Multi-view Subspace Clustering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[65]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[66]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[67]  Jun Guo,et al.  Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering , 2019, AAAI.

[68]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[69]  Feiping Nie,et al.  Auto-weighted multi-view co-clustering via fast matrix factorization , 2020, Pattern Recognit..

[70]  Hao Wang,et al.  GMC: Graph-Based Multi-View Clustering , 2020, IEEE Transactions on Knowledge and Data Engineering.

[71]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.