Deep tree-ensembles for multi-output prediction

Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods have been recently proposed, as efficient and low-scale solutions. Despite that, these approaches simply employ label classification probabilities as induced features and primarily focus on traditional classification and regression tasks, leaving multi-output prediction under-explored. Moreover, recent work has demonstrated that tree-embeddings are highly representative, especially in structured output prediction. In this direction, we propose a novel deep tree-ensemble (DTE) model, where every layer enriches the original feature set with a representation learning component based on tree-embeddings. In this paper, we specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression. We conducted experiments using multiple benchmark datasets and the obtained results confirm that our method provides superior results to state-of-the-art methods in both tasks.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Saso Dzeroski,et al.  Tree ensembles for predicting structured outputs , 2013, Pattern Recognit..

[3]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[4]  Zhenbing Liu,et al.  Cost-sensitive deep forest for price prediction , 2020, Pattern Recognit..

[5]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[6]  Ricardo Cerri,et al.  DSTARS: A multi-target deep structure for tracking asynchronous regressor stacking , 2020, Appl. Soft Comput..

[7]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[8]  Bernhard Pfahringer,et al.  Classifier Chains: A Review and Perspectives , 2019, J. Artif. Intell. Res..

[9]  Konstantinos Pliakos,et al.  Mining features for biomedical data using clustering tree ensembles , 2018, J. Biomed. Informatics.

[10]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[11]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[12]  Jun Du,et al.  Hierarchical deep neural network for multivariate regression , 2017, Pattern Recognit..

[13]  Konstantinos Pliakos,et al.  Network inference with ensembles of bi-clustering trees , 2019, BMC Bioinformatics.

[14]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[15]  Ponnuthurai N. Suganthan,et al.  Heterogeneous oblique random forest , 2020, Pattern Recognit..

[16]  Ling Shao,et al.  Heterogenous output regression network for direct face alignment , 2020, Pattern Recognit..

[17]  Michelangelo Ceci,et al.  Ensembles of extremely randomized predictive clustering trees for predicting structured outputs , 2020, Machine Learning.

[18]  Celine Vens,et al.  Random Forest Based Feature Induction , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[20]  Kunhong Liu,et al.  An improved deep forest for alleviating the data imbalance problem , 2020, Soft Computing.

[21]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Deep Forest , 2019, ECAI.

[22]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[23]  Huijun Gao,et al.  Data-Based Techniques Focused on Modern Industry: An Overview , 2015, IEEE Transactions on Industrial Electronics.

[24]  Liang Yang,et al.  Learning from Weak-Label Data: A Deep Forest Expedition , 2020, AAAI.

[25]  Celine Vens,et al.  Machine learning for discovering missing or wrong protein function annotations , 2019, BMC Bioinformatics.

[26]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[27]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[28]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[29]  Eyke Hüllermeier,et al.  Multi-target prediction: a unifying view on problems and methods , 2018, Data Mining and Knowledge Discovery.

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[31]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[32]  Songcan Chen,et al.  Expand globally, shrink locally: Discriminant multi-label learning with missing labels , 2021, Pattern Recognit..

[33]  Xianhua Zeng,et al.  Deep forest hashing for image retrieval , 2019, Pattern Recognit..

[34]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[35]  Sam Kwong,et al.  Active k-labelsets ensemble for multi-label classification , 2021, Pattern Recognit..

[36]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[37]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[38]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.