Supervised Nonnegative Tucker Decomposition for Computational Phenotyping

With the availability of Electronic Health Records (EHR) data, lots of predictive tasks in medical practice seem solvable by building predictive models. However, EHR data always contains various medical concepts (e.g., diagnosis, medicines, lab tests) with high dimensions and mass correlations among them. To avoid the curse of dimensionality, the representation learning or dimensional reduction process is needed before prediction. Traditional methods either could not mine the correlations of different dimensions in datasets or lack of interpretability. Therefore we proposed a supervised tucker tensor decomposition (SNTD) which is guided by the prediction tasks for representation learning. Detailly speaking, SNTD constrains factor matrices by adding the label information of task and generating regularization term towards the non-negative tucker decomposition (NTD). Compared to the CP decomposition based methods, SNTD has better performance for that more flexible core tensor can contribute to the better learning of correlations among dimensions largely. We also demonstrate the accuracy and interpretability of our approach on an real-world EHR data for hospitalization task. Our result shows SNTD can not only obtain better performance of prediction than baseline methods, but also achieve the interpretability of representations.

[1]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[2]  Rasmus Bro,et al.  Improving the speed of multi-way algorithms:: Part I. Tucker3 , 1998 .

[3]  A. Cichocki,et al.  Tensor decompositions for feature extraction and classification of high dimensional datasets , 2010 .

[4]  Michael P. Friedlander,et al.  Computing non-negative tensor factorizations , 2008, Optim. Methods Softw..

[5]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[6]  Seungjin Choi,et al.  Nonnegative Tucker Decomposition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[8]  Sam T. Roweis,et al.  EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.

[9]  Bin Ran,et al.  Tensor completion via a multi-linear low-n-rank factorization model , 2014, Neurocomputing.

[10]  Fei Wang,et al.  TaGiTeD: Predictive Task Guided Tensor Decomposition for Representation Learning from Electronic Health Records , 2017, AAAI.

[11]  B. Recht,et al.  Tensor completion and low-n-rank tensor recovery via convex optimization , 2011 .

[12]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[13]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[14]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[15]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[16]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[19]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[20]  Xu Tan,et al.  Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint , 2013, AAAI.

[21]  Tamara G. Kolda,et al.  Numerical optimization for symmetric tensor decomposition , 2014, Mathematical Programming.

[22]  Jimeng Sun,et al.  SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping , 2018, KDD.

[23]  Yann LeCun,et al.  Structured sparse coding via lateral inhibition , 2011, NIPS.

[24]  E. Lock,et al.  Supervised multiway factorization. , 2016, Electronic journal of statistics.

[25]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[26]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[27]  Jieping Ye,et al.  Sparse non-negative tensor factorization using columnwise coordinate descent , 2012, Pattern Recognit..

[28]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[29]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[30]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[31]  Johan A. K. Suykens,et al.  Tensor Versus Matrix Completion: A Comparison With Application to Spectral Data , 2011, IEEE Signal Processing Letters.

[32]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[33]  Yoshua Bengio,et al.  Neural net language models , 2008, Scholarpedia.

[34]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[35]  Haesun Park,et al.  Fast Nonnegative Tensor Factorization with an Active-Set-Like Method , 2012, High-Performance Scientific Computing.

[36]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[37]  T. Kolda Multilinear operators for higher-order decompositions , 2006 .

[38]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[39]  Max Welling,et al.  Positive tensor factorization , 2001, Pattern Recognit. Lett..

[40]  Yoshua Bengio,et al.  Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[41]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[42]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[43]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[44]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.