Cost-sensitive deep forest for price prediction

Abstract For many real-world applications, predicting a price range is more practical and desirable than predicting a concrete value. In this case, price prediction can be regarded as a classification problem. Although deep forest is recognized as the best solution to many classification problems, a crucial issue limits its direct application to price prediction, i.e., it treated all the misclassifications equally no matter how far away they are from the real classes, since their impacts on the accuracy are the same. This is unreasonable to price prediction as the misclassification should be as close to the real price range as possible even if they have to be wrongly classified. To address this issue, we propose a cost-sensitive deep forest for price prediction, which maintains the high accuracy of deep forest, and propels the misclassifications to be closer to the real price range to reduce the cost of misclassifications. To make the classification more meaningful, we develop a discretization method to pre-define the classes of price, by modifying the conventional K-means method. The experimental results based on multiple real-world datasets (i.e., car sharing, house renting and real estate selling) show that, the cost-sensitive deep forest can significantly reduce the cost in comparison with the conventional deep forest and other baselines, while keeping satisfactory accuracy.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Björn E. Ottersten,et al.  Example-dependent cost-sensitive decision trees , 2015, Expert Syst. Appl..

[3]  José María Gómez Hidalgo,et al.  Combining Text and Heuristics for Cost-Sensitive Spam Filtering , 2000, CoNLL/LLL.

[4]  Tao Li,et al.  A novel data-driven stock price trend prediction system , 2018, Expert Syst. Appl..

[5]  Ran Wang,et al.  Noniterative Deep Learning: Incorporating Restricted Boltzmann Machine Into Multilayer Random Weight Neural Networks , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[6]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[7]  Huijuan Lu,et al.  A cost-sensitive rotation forest algorithm for gene expression data classification , 2017, Neurocomputing.

[8]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[9]  Ponnuthurai N. Suganthan,et al.  Short-term Electricity Price Forecasting with Empirical Mode Decomposition based Ensemble Kernel Machines , 2017, ICCS.

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Susan Shaheen,et al.  Peer-to-Peer Carsharing , 2014 .

[14]  Stephen A. Jarvis,et al.  A spatio-temporal, Gaussian process regression, real-estate price predictor , 2016, SIGSPATIAL/GIS.

[15]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[16]  Jie Zhang,et al.  Finding the Shortest Path in Stochastic Vehicle Routing: A Cardinality Minimization Approach , 2016, IEEE Transactions on Intelligent Transportation Systems.

[17]  Zhenbing Liu,et al.  Cost-Sensitive Collaborative Representation Based Classification via Probability Estimation Addressing the Class Imbalance Problem , 2018 .

[18]  Dazhe Zhao,et al.  An Optimized Cost-Sensitive SVM for Imbalanced Data Learning , 2013, PAKDD.

[19]  Sarvapali D. Ramchurn,et al.  On the distinctiveness of the electricity load profile , 2018, Pattern Recognit..

[20]  George A. Papakostas,et al.  Distance and similarity measures between intuitionistic fuzzy sets: A comparative analysis from a pattern recognition point of view , 2013, Pattern Recognit. Lett..

[21]  Konstantinos I. Diamantaras,et al.  Airfare prices prediction using machine learning techniques , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[22]  Yunpeng Li,et al.  Cost-sensitive ensemble classifiers for microwave breast cancer detection , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[24]  Zhao-Rong Lai,et al.  Trend representation based log-density regularization system for portfolio optimization , 2018, Pattern Recognit..

[25]  P. N. Suganthan,et al.  Benchmarking Ensemble Classifiers with Novel Co-Trained Kernal Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier] , 2017, IEEE Computational Intelligence Magazine.

[26]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[27]  Hongliang Guo,et al.  A Unified Framework for Vehicle Rerouting and Traffic Light Control to Reduce Traffic Congestion , 2017, IEEE Transactions on Intelligent Transportation Systems.

[28]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[29]  Zhi-Hua Zhou,et al.  Cost-Sensitive Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Luís Torgo,et al.  Regression Using Classification Algorithms , 1997, Intell. Data Anal..

[31]  Jun Wang,et al.  Crude oil price prediction model with long short term memory deep learning based on prior knowledge data transfer , 2019, Energy.

[32]  Hui Xiong,et al.  A Generalization of Proximity Functions for K-Means , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[33]  Charu C. Aggarwal,et al.  Stock Price Prediction via Discovering Multi-Frequency Trading Patterns , 2017, KDD.

[34]  Fan Yang,et al.  Using random forest for reliable classification and cost-sensitive learning for medical diagnosis , 2009, BMC Bioinformatics.

[35]  Tomaso A. Poggio,et al.  When and Why Are Deep Networks Better Than Shallow Ones? , 2017, AAAI.

[36]  Xianhua Zeng,et al.  Deep forest hashing for image retrieval , 2019, Pattern Recognit..

[37]  Antônio de Pádua Braga,et al.  Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[39]  Yang Yu,et al.  Spectrum of Variable-Random Trees , 2008, J. Artif. Intell. Res..

[40]  Lin Zhu,et al.  Large-Scale Price Interval Prediction at OTA Sites , 2018, IEEE Access.