Learning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data

Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data in an explainable manner. To address this challenge, we propose a method to characterize globally nonlinear multi-view relationships as a mixture of linear relationships. A clustering method, it identifies partitions of observations that exhibit the same relationships and learns those relationships simultaneously. It defines cluster variables by multi-view rather than spatial relationships, unlike almost all other clustering methods. Furthermore, we introduce a supervised classification method that builds on our clustering method by employing multi-view relationships as discriminative factors. The value of these methods resides in their capability to find useful structure in the data that single-view or current multi-view methods may struggle to find. We demonstrate the potential utility of the proposed approach using an application in clinical informatics to detect and characterize slow bleeding in patients whose central venous pressure (CVP) is monitored at the bedside. Presently, CVP is considered an insensitive measure of a subject's intravascular volume status or its change. However, we reason that features of CVP during inspiration and expiration should be informative in early identification of emerging changes of patient status. We empirically show how the proposed method can help discover and analyze multiple-to-multiple correlations, which could be nonlinear or vary throughout the population, by finding explainable structure of operational interest to practitioners.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  M R Pinsky,et al.  Instantaneous venous return curves in an intact canine preparation. , 1984, Journal of applied physiology: respiratory, environmental and exercise physiology.

[3]  Igor Jurisica,et al.  Knowledge Discovery and Data Mining in Biomedical Informatics: The Future Is in Integrative, Interactive Machine Learning Solutions , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[4]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[5]  J. Teboul,et al.  Using heart-lung interactions to assess fluid responsiveness during mechanical ventilation , 2000, Critical care.

[6]  Feiping Nie,et al.  Multi-View Clustering and Feature Learning via Structured Sparsity , 2013, ICML.

[7]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[8]  Taka-aki Nakada,et al.  Fluid resuscitation in septic shock: A positive fluid balance and elevated central venous pressure are associated with increased mortality* , 2011, Critical care medicine.

[9]  Samuel Kaski,et al.  Probabilistic approach to detecting dependencies between data sets , 2008, Neurocomputing.

[10]  Colin Campbell,et al.  Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery , 2014, PLoS Comput. Biol..

[11]  Helmuth Späth,et al.  A fast algorithm for clusterwise linear regression , 1982, Computing.

[12]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  G. Navis,et al.  Increased central venous pressure is associated with impaired renal function and mortality in a broad spectrum of patients with cardiovascular disease. , 2009, Journal of the American College of Cardiology.

[15]  Volker Roth,et al.  Copula Mixture Model for Dependency-seeking Clustering , 2012, ICML.

[16]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[17]  Carla E. Brodley,et al.  Correlation Clustering for Learning Mixtures of Canonical Correlation Models , 2005, SDM.

[18]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[19]  Arthur Zimek,et al.  Correlation clustering , 2009, SKDD.

[20]  Clifford Kavinsky,et al.  Pulmonary artery occlusion pressure and central venous pressure fail to predict ventricular filling volume, cardiac performance, or the response to volume infusion in normal subjects , 2004, Critical care medicine.

[21]  Alissa R Sherry,et al.  Conducting and Interpreting Canonical Correlation Analysis in Personality Research: A User-Friendly Primer , 2005, Journal of personality assessment.

[22]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[23]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[25]  R.J. Hathaway,et al.  Switching regression models and fuzzy clustering , 1993, IEEE Trans. Fuzzy Syst..

[26]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[27]  Hong Yu,et al.  Multi-view clustering via multi-manifold regularized non-negative matrix factorization , 2017, Neural Networks.

[28]  R. Cavallazzi,et al.  Does the Central Venous Pressure Predict Fluid Responsiveness? An Updated Meta-Analysis and a Plea for Some Common Sense* , 2013, Critical care medicine.

[29]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[30]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[31]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[32]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[33]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[35]  Didier Payen,et al.  Functional hemodynamic monitoring. , 2002, Intensive care medicine.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  H. Krumholz Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. , 2014, Health affairs.