Quantum-Assisted Feature Selection for Vehicle Price Prediction Modeling

Within machine learning model evaluation regimes, feature selection is a technique used to reduce model complexity and improve model performance in regards to generalization, model fit, and accuracy of prediction. However, the search over the space of features to find the subset of k optimal features is a known NP-Hard problem. In this work, we study metrics for encoding the combinatorial search as a binary quadratic model, such as Generalized Mean Information Coefficient and Pearson Correlation Coefficient in application to the underlying regression problem of price prediction. We investigate trade-offs in the form of run-times and model performance, of leveraging quantum-assisted vs. classical subroutines for the combinatorial search, using minimum redundancy maximal relevancy as the heuristic for our approach. We achieve accuracy scores of 0.9 (in the range of [0,1]) for finding optimal subsets on synthetic data using a new metric which we define. We test and cross validate predictive models on a real world problem of price prediction, and show a performance improvement of mean absolute error scores for our quantum-assisted method (1471.02 ± 135.6), vs. similar methodologies such as recursive feature elimination (1678.3±143.7). Our findings show that by leveraging quantum assisted routines we find solutions which increase the quality of predictive model output while reducing the input dimensionality to the learning algorithm on synthetic and real-world data. Keywords—Combinatorial Optimization, Feature Selection, Machine Learning, Price Prediction, Quantum Computing, Quantum Machine Learning, Supervised Learning

[1]  Yadong Wang,et al.  The minimum feature subset selection problem , 1997, Journal of Computer Science and Technology.

[2]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[3]  Alexander Luedtke,et al.  The Generalized Mean Information Coefficient , 2013, 1308.5712.

[4]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[5]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[6]  David Von Dollen,et al.  Traffic Flow Optimization Using a Quantum Annealer , 2017, Front. ICT.

[7]  Max Rounds,et al.  Optimal feature selection in credit scoring and classification using a quantum annealer , 2017 .

[8]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[9]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[10]  Davide Albanese,et al.  A practical tool for maximal information coefficient analysis , 2017, bioRxiv.

[11]  Harshinder Singh,et al.  Nearest Neighbor Estimates of Entropy , 2003 .

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[15]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[16]  Hiroshi Motoda,et al.  Feature Selection Extraction and Construction , 2002 .

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[19]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[20]  T. Kaewkiriya,et al.  Prediction of prices for used car by using regression models , 2018, 2018 5th International Conference on Business and Industrial Research (ICBIR).

[21]  Kapil K. Sharma,et al.  Quantum Adiabatic Feature Selection , 2019, 1909.08732.

[22]  Yue Sun,et al.  Option Pricing using Quantum Computers , 2019, Quantum.

[23]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[24]  Sadaqat Jan,et al.  Vehicle Price Prediction System using Machine Learning Techniques , 2017 .

[25]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[26]  Sameerchand Pudaruth,et al.  Predicting the Price of Used Cars using Machine Learning Techniques , 2006 .

[27]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[28]  Michael Mitzenmacher,et al.  Equitability Analysis of the Maximal Information Coefficient, with Comparisons , 2013, ArXiv.

[29]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..