论文信息 - Multi-target regression via output space quantization

Multi-target regression via output space quantization

Multi-target regression is concerned with the prediction of multiple continuous target variables using a shared set of predictors. Two key challenges in multi-target regression are: (a) modelling target dependencies and (b) scalability to large output spaces. In this paper, a new multi-target regression method is proposed that tries to jointly address these challenges via a novel problem transformation approach. The proposed method, called MRQ, is based on the idea of quantizing the output space in order to transform the multiple continuous targets into one or more discrete ones. Learning on the transformed output space naturally enables modeling of target dependencies while the quantization strategy can be flexibly parameterized to control the trade-off between prediction accuracy and computational efficiency. Experiments on a large collection of benchmark datasets show that MRQ is both highly scalable and also competitive with the state-of-the-art in terms of accuracy. In particular, an ensemble version of MRQ obtains the best overall accuracy, while being an order of magnitude faster than the runner up method.

Ioannis P. Vlahavas | Konstantinos Sechidis | Eleftherios Spyromitros Xioufis | I. Vlahavas | Konstantinos Sechidis

[1] Eric P. Xing,et al. Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[2] Victor Lempitsky,et al. Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] A. Izenman. Reduced-rank regression for the multivariate linear model , 1975 .

[4] Saso Dzeroski,et al. Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE , 1999, PKDD.

[5] Ben Taskar,et al. Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[6] Grigorios Tsoumakas,et al. Random K-labelsets for Multilabel Classification , 2022 .

[7] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] S. Chakraborty. Generating discrete analogues of continuous probability distributions-A survey of methods and constructions , 2015, Journal of Statistical Distributions and Applications.

[9] Grigorios Tsoumakas,et al. Multi-target Regression via Random Linear Target Combinations , 2014, ECML/PKDD.

[10] Yoshua Bengio,et al. Multi-Task Learning for Stock Selection , 1996, NIPS.

[11] J. Zidek,et al. Multivariate regression analysis and canonical variates , 1980 .

[12] J. Friedman,et al. Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[13] Fernando Pérez-Cruz,et al. SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems , 2004, IEEE Transactions on Signal Processing.

[14] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.

[15] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[16] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[17] David L. Neuhoff,et al. Quantization , 2022, IEEE Trans. Inf. Theory.

[18] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[19] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.

[20] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[21] Jieping Ye,et al. Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[22] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[23] Michelangelo Ceci,et al. Predictive Modeling of PV Energy Production: How to Set Up the Learning Task for a Better Prediction? , 2017, IEEE Transactions on Industrial Informatics.

[24] D. Pfeffermann,et al. Small area estimation , 2011 .

[25] Eyke Hüllermeier,et al. Multi-target prediction: a unifying view on problems and methods , 2018, Data Mining and Knowledge Discovery.

[26] Saso Dzeroski,et al. Stepwise Induction of Multi-target Model Trees , 2007, ECML.

[27] Tapio Elomaa,et al. Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[28] Min-Ling Zhang,et al. A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[29] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30] Jiayu Zhou,et al. Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[31] Eyke Hüllermeier,et al. On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[32] Rich Caruana,et al. Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[33] Pierre Geurts,et al. Random Forests with Random Projections of the Output Space for High Dimensional Multi-label Classification , 2014, ECML/PKDD.

[34] Grigorios Tsoumakas,et al. Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[35] Grigorios Tsoumakas,et al. An empirical study on sea water quality prediction , 2008, Knowl. Based Syst..

[36] Grigorios Tsoumakas,et al. Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[37] Ioannis P. Vlahavas,et al. Information Theoretic Multi-Target Feature Selection via Output Space Quantization † , 2019, Entropy.

[38] James J. Little,et al. Revisiting Additive Quantization , 2016, ECCV.

[39] Neil D. Lawrence,et al. Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[40] Grigorios Tsoumakas,et al. MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[41] Xi Chen,et al. Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[42] Luc De Raedt,et al. Top-Down Induction of Clustering Trees , 1998, ICML.

[43] Juan José del Coz,et al. Binary relevance efficacy for multilabel classification , 2012, Progress in Artificial Intelligence.