Multi-view learning-based data proliferator for boosting classification using highly imbalanced classes

BACKGROUND Multi-view data representation learning explores the relationship between the views and provides rich complementary information that can improve computer-aided diagnosis. Specifically, existing machine learning methods devised to automate neurological disorder diagnosis using brain data provided new insights into how a particular disorder such as autism spectrum disorder (ASD) alters the brain construct. However, the performance of machine learning methods highly depends on the size of the training samples from both classes. In a real-world clinical setting, such medical data is very expensive and challenging to collect, might (i) suffer from several limitations such as imbalanced classes and (ii) have non-heterogeneous distribution when derived from multi-view brain representations. NEW METHOD To the best of our knowledge, the problem of imbalanced and multi-view data classification remains unexplored in the field of network neuroscience. To fill this gap, we propose a Multi-View LEArning-based data Proliferator (MV-LEAP) that enables the classification of imbalanced multi-view representations. MV-LEAP comprises two key steps. First, a manifold learning-based proliferator, which enables to generate synthetic data for each view, is developed to handle imbalanced data. Second, a multi-view manifold data alignment leveraging tensor canonical correlation analysis is proposed to map all original and proliferated (i.e., synthesized) views into a shared subspace where their distributions are aligned for the target classification task. RESULTS We evaluated our method on imbalanced multi-view ASD vs normal control connectomic datasets with imbalanced classes. CONCLUSION Overall, MV-LEAP achieved the best classification results in comparison with baseline data synthesis methods.

[1]  Bo Wang,et al.  SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning , 2017, bioRxiv.

[2]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[6]  Maureen S. Durkin,et al.  Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012 , 2018, Morbidity and mortality weekly report. Surveillance summaries.

[7]  Chris Cornelis,et al.  SMOTE-FRST: a new resampling method using fuzzy rough set theory , 2012 .

[8]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[9]  Zhenyu He,et al.  A multi-view model for visual tracking via correlation filters , 2016, Knowl. Based Syst..

[10]  Islem Rekik,et al.  Unsupervised Manifold Learning Using High-Order Morphological Brain Networks Derived From T1-w MRI for Autism Diagnosis , 2018, Front. Neuroinform..

[11]  Islem Rekik,et al.  Clustering-based multi-view network fusion for estimating brain network atlases of healthy and disordered populations , 2019, Journal of Neuroscience Methods.

[12]  O. Sporns Structure and function of complex brain networks , 2013, Dialogues in clinical neuroscience.

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  Ming Yang,et al.  A Survey of Multi-View Representation Learning , 2019, IEEE Transactions on Knowledge and Data Engineering.

[15]  Islem Rekik,et al.  Dynamic Multi-scale CNN Forest Learning for Automatic Cervical Cancer Segmentation , 2018, MLMI@MICCAI.

[16]  Ma Li,et al.  CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests , 2017, BMC Bioinformatics.

[17]  Anders M. Dale,et al.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest , 2006, NeuroImage.

[18]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[19]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .

[20]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Min Xiao,et al.  Cross Language Text Classification via Subspace Co-regularized Multi-view Learning , 2012, ICML.

[22]  Robert Sabourin,et al.  Random forest dissimilarity based multi-view learning for Radiomics application , 2019, Pattern Recognit..

[23]  Li Liu,et al.  Breast mass classification via deeply integrating the contextual information from multi-view data , 2018, Pattern Recognit..

[24]  H. Abdi,et al.  Principal component analysis , 2010 .

[25]  Dong Yue,et al.  Multi-view low-rank dictionary learning for image classification , 2016, Pattern Recognit..

[26]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[27]  Islem Rekik,et al.  Joint Pairing and Structured Mapping of Convolutional Brain Morphological Multiplexes for Early Dementia Diagnosis , 2019, Brain Connect..

[28]  Dinggang Shen,et al.  View‐aligned hypergraph learning for Alzheimer's disease diagnosis with incomplete multi‐modality data , 2017, Medical Image Anal..

[29]  I. Rekik,et al.  Gender differences in cortical morphological networks , 2019, Brain Imaging and Behavior.

[30]  Islem Rekik,et al.  Cooperative Correlational and Discriminative Ensemble Classifier Learning for Early Dementia Diagnosis Using Morphological Brain Multiplexes , 2018, IEEE Access.

[31]  Sokol Koço,et al.  Learning from Imbalanced Datasets with Cross-View Cooperation-Based Ensemble Methods , 2019 .

[32]  Wei Yuan,et al.  Multi-view manifold learning with locality alignment , 2018, Pattern Recognit..

[33]  Islem Rekik,et al.  Pairing-based Ensemble Classifier Learning using Convolutional Brain Multiplexes and Multi-view Brain Networks for Early Dementia Diagnosis , 2017, CNI@MICCAI.

[34]  Feiping Nie,et al.  Adaptive-weighting discriminative regression for multi-view classification , 2019, Pattern Recognit..

[35]  Jun Zhang,et al.  Group-Wise Learning for Aurora Image Classification With Multiple Representations , 2019, IEEE Transactions on Cybernetics.

[36]  Yun Zhai,et al.  A New Over-sample Method Based on Distribution Density , 2014, J. Comput..

[37]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[38]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[39]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[40]  Islem Rekik,et al.  Brain multiplexes reveal morphological connectional biomarkers fingerprinting late brain dementia states , 2018, Scientific Reports.

[41]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[42]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[43]  Jing Zhang,et al.  Multi-view learning with fisher kernel and bi-bagging for imbalanced problem , 2019, Applied Intelligence.

[44]  Liang Wang,et al.  Unified subspace learning for incomplete and unlabeled multi-view data , 2017, Pattern Recognit..

[45]  Mohiuddin Ahmad,et al.  Human action recognition using multi-view image sequences , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[46]  Chun-Wu Yeh,et al.  Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets , 2017, PloS one.

[47]  Roberto Tagliaferri,et al.  Multiview Learning in Biomedical Applications , 2019, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[48]  E. Bullmore,et al.  Social intelligence in the normal and autistic brain: an fMRI study , 1999, The European journal of neuroscience.

[49]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[50]  Yong Luo,et al.  Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction , 2015, IEEE Trans. Knowl. Data Eng..

[51]  Islem Rekik,et al.  High-order Connectomic Manifold Learning for Autistic Brain State Identification , 2017, CNI@MICCAI.

[52]  Islem Rekik,et al.  Tree-based Ensemble Classifier Learning for Automatic Brain Glioma Segmentation , 2018, Neurocomputing.

[53]  Yong Luo,et al.  Multiview Matrix Completion for Multilabel Image Classification , 2015, IEEE Transactions on Image Processing.