Learning Models For Corrupted Multi-Dimensional Data: Fundamental Limits And Algorithms

LEARNING MODELS FOR CORRUPTED MULTI-DIMENSIONAL DATA: FUNDAMENTAL LIMITS AND ALGORITHMS by ISHAN JINDAL May 2019 Advisors: Dr. Harpreet Singh and Dr. Matthew Nokleby Major: Electrical and Computer Engineering Degree: Doctor of Philosophy The development of machine learning models that can handle corrupted data, such as training data with unreliable labels or multi-dimensional signals corrupted by noise or data erasures, have become a central necessity in the era of learning from massive datasets. Large datasets are typically generated via human annotation, which results in human errors that cannot be eliminated at scale. Existing techniques for dealing with noisy datasets do not exploit the multi-dimensional structures of the signals, which could be used to improve the overall classification and representation performance of the model. In this thesis, we develop a Kronecker-structure (K-S) subspace model that exploits the multi-dimensional structure of the signal. First, we study the classification performance of K-S subspace models in two asymptotic regimes when the signal dimensions go to infinity and when the noise power tends to zero. We characterize the misclassification probability in terms of diversity order and we drive an exact expression for the diversity order. We further derive a tighter bound on misclassification probability in terms of pairwise geometry of the subspaces. The proposed scheme is optimal in most of the signal dimension regimes except in one regime where the signal dimension is less than twice the subspace dimension,

[1]  Louis L. Scharf,et al.  Interference estimation with applications to blind multiple-access communication over fading channels , 2000, IEEE Trans. Inf. Theory.

[2]  Jieping Ye,et al.  A Unified Neural Network Approach for Estimating Travel Time and Distance for a Taxi Trip , 2017, ArXiv.

[3]  Yen-Wei Chen,et al.  K-CPD: Learning of overcomplete dictionaries for tensor sparse coding , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Elaheh Rashedi Learning Convolutional Neural Network For Face Verification , 2018 .

[5]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[6]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[8]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Robert D. Nowak,et al.  High-dimensional Matched Subspace Detection when data are missing , 2010, 2010 IEEE International Symposium on Information Theory.

[11]  Xiaogang Wang,et al.  T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[15]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Luca Maria Gambardella,et al.  Deep Big Multilayer Perceptrons for Digit Recognition , 2012, Neural Networks: Tricks of the Trade.

[17]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[18]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[20]  Louis L. Scharf,et al.  Matched subspace detectors , 1994, IEEE Trans. Signal Process..

[21]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[22]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[23]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[24]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[25]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[26]  Syed Zubair,et al.  Tensor dictionary learning with sparse TUCKER decomposition , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Anand D. Sarwate,et al.  Minimax lower bounds for Kronecker-structured dictionary learning , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[29]  Lizhong Zheng,et al.  Diversity and multiplexing: a fundamental tradeoff in multiple-antenna channels , 2003, IEEE Trans. Inf. Theory.

[30]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[31]  Alfred O. Hero,et al.  Kronecker sum decompositions of space-time data , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[32]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[33]  Lourens J. Waldorp,et al.  Estimating stationary dipoles from MEG/EEG data contaminated with spatially and temporally correlated background noise , 2001, NeuroImage.

[34]  Shanmuganathan Raman,et al.  Effective object tracking in unstructured crowd scenes , 2015, 2016 International Conference on Signal and Information Processing (IConSIP).

[35]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[36]  Aarti Singh,et al.  Subspace detection of high-dimensional vectors using compressive sampling , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[37]  Richard P. Wildes,et al.  Dynamic scene understanding: The role of orientation features in space and time in scene classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Akshay Krishnamurthy,et al.  Low-Rank Matrix and Tensor Completion via Adaptive Sampling , 2013, NIPS.

[40]  Yi Yang,et al.  Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[43]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Junzhou Huang,et al.  Transformation-Invariant Collaborative Sub-representation , 2014, 2014 22nd International Conference on Pattern Recognition.

[45]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[46]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[47]  Iwao Kanno,et al.  Activation detection in functional MRI using subspace modeling and maximum likelihood estimation , 1999, IEEE Transactions on Medical Imaging.

[48]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Misha Elena Kilmer,et al.  A tensor-based dictionary learning approach to tomographic image reconstruction , 2015, BIT Numerical Mathematics.

[50]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[51]  Javed A. Aslam,et al.  On the Sample Complexity of Noise-Tolerant Learning , 1996, Inf. Process. Lett..

[52]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[53]  Peter Lancaster,et al.  Norms on direct sums and tensor products , 1972 .

[54]  Stefanos Zafeiriou,et al.  Robust Kronecker Component Analysis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Michael Elad,et al.  Compression of facial images using the K-SVD algorithm , 2008, J. Vis. Commun. Image Represent..

[56]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[57]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[58]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[59]  A. Robert Calderbank,et al.  Classification and Reconstruction of High-Dimensional Signals From Low-Dimensional Features in the Presence of Side Information , 2014, IEEE Transactions on Information Theory.

[60]  Xuanqin Mou,et al.  Tensor-based dictionary learning for dynamic tomographic reconstruction , 2015, Physics in medicine and biology.

[61]  Matthew S. Nokleby,et al.  Performance limits on the classification of Kronecker-structured models , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[62]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[63]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[65]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[66]  Trevor Darrell,et al.  Auxiliary Image Regularization for Deep CNNs with Noisy Labels , 2015, ICLR.

[67]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[68]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[69]  Jun Wang,et al.  Shape Detection from Raw LiDAR Data with Subspace Modeling , 2017, IEEE Transactions on Visualization and Computer Graphics.

[70]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[71]  Matthew S. Nokleby,et al.  Tensor Matched Kronecker-structured Subspace Detection for Missing Information , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[73]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Matthew Nokleby,et al.  Classification and Representation via Separable Subspaces: Performance Limits and Algorithms , 2018, IEEE Journal of Selected Topics in Signal Processing.

[75]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[76]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Martin Kleinsteuber,et al.  Separable Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[79]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[80]  Tong Wu,et al.  Subspace detection in a kernel space: The missing data case , 2014, 2014 IEEE Workshop on Statistical Signal Processing (SSP).

[81]  Nagashettappa Biradar,et al.  Echocardiographic image denoising using extreme total variation bilateral filter , 2016 .

[82]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[83]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[84]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[85]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[86]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[87]  Matthew S. Nokleby,et al.  Fast and compact Kronecker-structured dictionary learning for classification and representation , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[88]  Petre Stoica,et al.  On Estimation of Covariance Matrices With Kronecker Product Structure , 2008, IEEE Transactions on Signal Processing.

[89]  R. E. Cline,et al.  The Rank of a Difference of Matrices and Associated Generalized Inverses , 1976 .

[90]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[91]  Osonde Osoba,et al.  Noise-enhanced convolutional neural networks , 2016, Neural Networks.

[92]  H. Begleiter,et al.  Event related potentials during object recognition tasks , 1995, Brain Research Bulletin.

[93]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[94]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[95]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[96]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[97]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[98]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[99]  Peizhen Zhu,et al.  Principal angles between subspaces and their tangents , 2012 .

[100]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[101]  Qi Wu,et al.  Image Captioning and Visual Question Answering Based on Attributes and External Knowledge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[103]  Vishal Monga,et al.  Fast Low-Rank Shared Dictionary Learning for Image Classification , 2016, IEEE Transactions on Image Processing.

[104]  D. Bernstein Matrix Mathematics: Theory, Facts, and Formulas , 2009 .

[105]  Jieping Ye,et al.  Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[106]  Daniel Pressel,et al.  An Effective Label Noise Model for DNN Text Classification , 2019, NAACL.

[107]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[108]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[109]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[110]  Dennis S. Bernstein,et al.  Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory , 2005 .

[111]  A. Robert Calderbank,et al.  The Role of Principal Angles in Subspace Classification , 2015, IEEE Transactions on Signal Processing.

[112]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[113]  Ali Chehab,et al.  Efficient subspace detection for high-order MIMO systems , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[114]  Demetri Terzopoulos,et al.  Multilinear subspace analysis of image ensembles , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[115]  Alfred O. Hero,et al.  Covariance Estimation in High Dimensions Via Kronecker Product Expansions , 2013, IEEE Transactions on Signal Processing.

[116]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[117]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[118]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[119]  Shanmuganathan Raman,et al.  Dynamic scene classification using convolutional neural networks , 2015, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[120]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[121]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[122]  Frank Nielsen,et al.  Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[123]  A. Robert Calderbank,et al.  Discrimination on the grassmann manifold: Fundamental limits of subspace classifiers , 2014, 2014 IEEE International Symposium on Information Theory.

[124]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.