Canonical Correlation Analysis of Datasets With a Common Source Graph

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA, however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is developed too. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  Georgios B. Giannakis,et al.  Topology Identification and Learning over Graphs: Accounting for Nonlinearities and Dynamics , 2018, Proceedings of the IEEE.

[3]  Gang Wang,et al.  Dpca: Dimensionality Reduction for Discriminative Analytics of Multiple Large-Scale Datasets , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[6]  Anita E. Bandrowski,et al.  The UCLA multimodal connectivity database: a web-based platform for brain connectivity matrix sharing and analysis , 2012, Front. Neuroinform..

[7]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[8]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[9]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[10]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[11]  Jia Chen,et al.  Data-driven sensors clustering and filtering for communication efficient field reconstruction , 2017, Signal Process..

[12]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Data Fusion and Group Inferences , 2010, IEEE Signal Processing Magazine.

[13]  A. Martínez,et al.  The AR face databasae , 1998 .

[14]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[15]  Georgios B. Giannakis,et al.  Kernel-Based Structural Equation Models for Topology Identification of Directed Networks , 2016, IEEE Transactions on Signal Processing.

[16]  Gang Wang,et al.  PSSE Redux: Convex Relaxation, Decentralized, Robust, and Dynamic Approaches , 2017, ArXiv.

[17]  Jin Tang,et al.  Graph-Laplacian PCA: Closed-Form Solution and Robustness , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Gang Wang,et al.  Canonical Correlation Analysis with Common Graph Priors , 2018, 2018 IEEE Statistical Signal Processing Workshop (SSP).

[19]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[20]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[21]  Gang Wang,et al.  Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets , 2018, IEEE Transactions on Signal Processing.

[22]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[23]  Ignacio Santamaría,et al.  Blind Identification of SIMO Wiener Systems Based on Kernel Canonical Correlation Analysis , 2013, IEEE Transactions on Signal Processing.

[24]  Quansen Sun,et al.  Graph regularized multiset canonical correlations with applications to joint feature extraction , 2014, Pattern Recognit..

[25]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[26]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[27]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jia Chen,et al.  Online Distributed Sparsity-Aware Canonical Correlation Analysis , 2016, IEEE Transactions on Signal Processing.

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[30]  I. Jolliffe Principal Component Analysis , 2002 .

[31]  Georgios B. Giannakis,et al.  Nonlinear dimensionality reduction on graphs , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[32]  Andreas Bartels,et al.  Semi-supervised kernel canonical correlation analysis with application to human fMRI , 2011, Pattern Recognit. Lett..

[33]  Fei Wang,et al.  Graph dual regularization non-negative matrix factorization for co-clustering , 2012, Pattern Recognit..

[34]  Nathanael Perraudin,et al.  Fast Robust PCA on Graphs , 2015, IEEE Journal of Selected Topics in Signal Processing.

[35]  Georgios B. Giannakis,et al.  Kernel-Based Reconstruction of Graph Signals , 2016, IEEE Transactions on Signal Processing.

[36]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.