A multi-valued and sequential-labeled decision tree method for recommending sequential patterns in cold-start situations

We plan to recommend some initial suitable single-itemed sequences like a flight itinerary based on a preference pattern in the form of personalized sequential pattern to each cold-start user. However, sequential pattern mining has never treated a conventional sequential pattern as a personalized pattern. Besides, as a cold-start user lacks the personalized sequential pattern, collaborative filtering cannot recommend one any single-itemed sequences. Thus, we first design such a preference pattern, namely representative sequential pattern, which reflects one’s main frequently recurring buying behavior mined from the item-sequences during a time period. After sampling a training-set from non-cold-start users who prefer similar items, we propose an auxiliary algorithm to mine the representative sequential pattern as the sequential class labels of each training instance. A multi-label classifier seems therefore be trained to predict the sequential-label for each cold-start user based on one’s features. However, most multi-label classification methods are designed to classify data whose class labels are non-sequential. Besides, some of the predictor attributes would be multi-valued in the real world. Aiming to handle such data, we have developed a novel algorithm, named MSDT (Multi-valued and Sequential-labeled Decision Tree). Experimental results indicate it outperforms all the baseline multi-label algorithms in accuracy even if three of them are deep learning algorithms.

[1]  Kristian Kersting,et al.  How is a data-driven approach better than random choice in label space division for multi-label classification? , 2016, Entropy.

[2]  Claude Sammut,et al.  Classification of Multivariate Time Series and Structured Data Using Constructive Induction , 2005, Machine Learning.

[3]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[4]  Vincent S. Tseng,et al.  Mining Maximal Sequential Patterns without Candidate Maintenance , 2013, ADMA.

[5]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[6]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[7]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[8]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[9]  Cheng-Jung Tsai,et al.  A Study of Improving the Performance of Mining Multi-Valued and Multi-Labeled Data , 2014, Informatica.

[10]  Saso Dzeroski,et al.  Tree-based methods for online multi-target regression , 2018, Journal of Intelligent Information Systems.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[13]  Yuan-Hai Shao,et al.  MLTSVM: A novel twin support vector machine to multi-label learning , 2016, Pattern Recognit..

[14]  Xuelong Li,et al.  When Location Meets Social Multimedia , 2015, ACM Transactions on Intelligent Systems and Technology.

[15]  Ming Dong,et al.  Hidden semi-Markov model-based reputation management system for online to offline (O2O) e-commerce markets , 2015, Decis. Support Syst..

[16]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[17]  Shihchieh Chou,et al.  MMDT: a multi-valued and multi-labeled decision tree classifier for data mining , 2005, Expert Syst. Appl..

[18]  Antonio Gomariz,et al.  VMSP: Efficient Vertical Mining of Maximal Sequential Patterns , 2014, Canadian Conference on AI.

[19]  Stefan Conrad,et al.  Hidden markov model-based time series prediction using motifs for detecting inter-time-serial correlations , 2012, SAC '12.

[20]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[21]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[22]  Peter S. Fader,et al.  Forecasting Repeat Sales at CDNOW: A Case Study , 2001, Interfaces.

[23]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[24]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[25]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[26]  Anne Laurent,et al.  M2SP: Mining Sequential Patterns Among Several Dimensions , 2005, PKDD.

[27]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[28]  Param Vir Singh,et al.  A Hidden Markov Model for Collaborative Filtering , 2010, MIS Q..

[29]  Jeffrey Scott Vitter,et al.  Scalable mining for classification rules in relational databases , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[30]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Mehrbakhsh Nilashi,et al.  Collaborative filtering recommender systems , 2013 .

[34]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[35]  Yen-Liang Chen,et al.  Constructing a multi-valued and multi-labeled decision tree , 2003, Expert Syst. Appl..

[36]  Dumitru Baleanu,et al.  New Derivatives on the Fractal Subset of Real-Line , 2015, Entropy.

[37]  Xingquan Zhu,et al.  Knowledge Discovery and Data Mining: Challenges and Realities , 2007 .

[38]  Laks V. S. Lakshmanan,et al.  Combating the Cold Start User Problem in Model Based Collaborative Filtering , 2017, ArXiv.

[39]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[40]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[41]  Tomasz Kajdanowicz,et al.  A scikit-based Python environment for performing multi-label classification , 2017, ArXiv.

[42]  Luca Martino,et al.  Scalable multi-output label prediction: From classifier chains to classifier trellises , 2015, Pattern Recognit..

[43]  Simon J. Puglisi,et al.  Practical Efficient String Mining , 2012, IEEE Transactions on Knowledge and Data Engineering.

[44]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[45]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.