Towards a unified multi-source-based optimization framework for multi-label learning

Abstract In the era of Big Data, a practical yet challenging task is to make learning techniques more universally applicable in dealing with the complex learning problem, such as multi-source multi-label learning. While some of the early work have developed many effective solutions for multi-label classification and multi-source fusion separately, in this paper we learn the two problems together, and propose a novel method for the joint learning of multiple class labels and data sources, in which an optimization framework is constructed to formulate the learning problem, and the result of multi-label classification is induced by the weighted combination of the decisions from multiple sources. The proposed method is responsive in exploiting the label correlations and fusing multi-source data, especially in the fusion of long-tail data. Experiments on various multi-source multi-label data sets reveal the advantages of the proposed method.

[1]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[2]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Huajun Chen,et al.  Modern bioinformatics meets traditional Chinese medicine , 2014, Briefings Bioinform..

[4]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[5]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[6]  Hen-Hong Chang,et al.  Latent class model based diagnostic system utilizing traditional Chinese medicine for patients with systemic lupus erythematosus , 2011, Expert Syst. Appl..

[7]  Qinghua Hu,et al.  Multi-label feature selection with streaming labels , 2016, Inf. Sci..

[8]  Yongcheng Li,et al.  Joint similar and specific learning for diabetes mellitus and impaired glucose regulation detection , 2017, Inf. Sci..

[9]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[10]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[11]  Jia Zhang,et al.  Multi-label learning with label-specific features by resolving label correlations , 2018, Knowl. Based Syst..

[12]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[13]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[14]  Javier Bajo,et al.  Multi-source homogeneous data clustering for multi-target detection from cluttered background with misdetection , 2017, Appl. Soft Comput..

[15]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[16]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Guozheng Li,et al.  Modelling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning , 2010, BMC complementary and alternative medicine.

[19]  Jia Zhang,et al.  Computational drug repositioning using collaborative filtering via multi-source fusion , 2017, Expert Syst. Appl..

[20]  Zoran Obradovic,et al.  Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources , 2013, ECML/PKDD.

[21]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[22]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[23]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[24]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[25]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  B. S. Manjunath,et al.  Multi-Label Learning With Fused Multimodal Bi-Relational Graph , 2014, IEEE Transactions on Multimedia.

[28]  Lei Wu,et al.  Lift: Multi-Label Learning with Label-Specific Features , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Zhengrui Jiang A Decision-Theoretic Framework for Numerical Attribute Value Reconciliation , 2012, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jing Zhang,et al.  Similarity computing model of high dimension data for symptom classification of Chinese traditional medicine , 2009, Appl. Soft Comput..

[32]  Yizhou Sun,et al.  A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[33]  Philip S. Yu,et al.  Multi-label Ensemble Learning , 2011, ECML/PKDD.

[34]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[35]  S. Siva Sathya,et al.  Evolutionary algorithms for de novo drug design - A survey , 2015, Appl. Soft Comput..

[36]  Xindong Wu,et al.  Learning Label-Specific Features and Class-Dependent Labels for Multi-Label Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[38]  Jie Duan,et al.  Multi-label feature selection based on neighborhood mutual information , 2016, Appl. Soft Comput..