A Fast Automated Model Selection Approach Based on Collaborative Knowledge

Great attention has been paid to data science in recent years. Besides data science experts, plenty of researchers from other domains are conducting data analysis as well because big data is becoming more easily accessible. However, for those non-expert researchers, it can be quite difficult to find suitable models to conduct their analysis tasks because of their lack of expertise and the existence of excessive models. In the meantime, existing model selection approaches rely too much on the content of data sets and take quite long time to make the selection, which makes these approaches inadequate to recommend models to non-experts online. In this paper, we present an efficient approach to conducting automated model selection efficiently based on analysis history and knowledge graph embeddings. Moreover, we introduce exterior features of data sets to enhance our approach as well as address the cold start issue. We conduct several experiments on competition data from Kaggle, a well-known online community of data researchers. Experimental results show that our approach can improve model selection efficiency dramatically and retain high accuracy as well.

[1]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[2]  Hongzhi Wang,et al.  Auto-Model: Utilizing Research Papers and HPO Techniques to Deal with the CASH problem , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[3]  Martial Hebert,et al.  Model recommendation: Generating object detectors from few samples , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Frédéric Jurie,et al.  Motion Models that Only Work Sometimes , 2012, BMVC.

[5]  Martial Hebert,et al.  Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Peiliang Xu Truncated SVD methods for discrete linear ill-posed problems , 1998 .

[7]  Aaron Klein,et al.  Auto-sklearn: Efficient and Robust Automated Machine Learning , 2019, Automated Machine Learning.

[8]  Marc Pollefeys,et al.  Segmenting video into classes of algorithm-suitability , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Dae Won Kim,et al.  OBOE: Collaborative Filtering for AutoML Model Selection , 2018, KDD.

[10]  Chaokun Wang,et al.  Which Algorithm Performs Best: Algorithm Selection for Community Detection , 2018, WWW.