Quantum computing and supervised machine learning

Abstract Quantum computing represents a promising paradigm for solving complex problems, such as large-number factorization, exhaustive search, optimization, and mean and median computation. On the other hand, supervised learning deals with the classical induction problem where an unknown input-output relation is inferred from a set of data that consists of examples of this relation. Lately, because of the rapid growth of the size of datasets, the dimensionality of the input and output space, and the variety and structure of the data, conventional learning techniques have started to show their limits. Considering these problems, the purpose of this chapter is to illustrate how quantum computing can be useful for addressing the computational issues of building, tuning, and estimating the performance of a model learned from data.

[1]  Masoud Mohseni,et al.  Quantum support vector machine for big feature and big data classification , 2013, Physical review letters.

[2]  Marco Muselli,et al.  On convergence properties of pocket algorithm , 1997, IEEE Trans. Neural Networks.

[3]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[4]  R. Barends,et al.  Coherent Josephson qubit suitable for scalable quantum integrated circuits. , 2013, Physical review letters.

[5]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[6]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[7]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[8]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[10]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[11]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[12]  L. Kilian,et al.  In-Sample or Out-of-Sample Tests of Predictability: Which One Should We Use? , 2002, SSRN Electronic Journal.

[13]  Davide Anguita,et al.  Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers , 2014, Pattern Recognit. Lett..

[14]  Massimiliano Pontil,et al.  Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..

[15]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[16]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[17]  Davide Anguita,et al.  A Deep Connection Between the Vapnik–Chervonenkis Entropy and the Rademacher Complexity , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[20]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[21]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[22]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[23]  Johan A. K. Suykens,et al.  Morozov, Ivanov and Tikhonov Regularization Based LS-SVMs , 2004, ICONIP.

[24]  Steven Walczak,et al.  Heuristic principles for the design of artificial neural networks , 1999, Inf. Softw. Technol..

[25]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[26]  Manfred Mücke,et al.  Effects of Reduced Precision on Floating-Point SVM Classification Accuracy , 2011, International Conference on Conceptual Structures.

[27]  Yves Grandvalet,et al.  Noise Injection: Theoretical Prospects , 1997, Neural Computation.

[28]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[29]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[30]  E. Rieffel,et al.  Quantum Computing: A Gentle Introduction , 2011 .

[31]  Davide Anguita,et al.  Performance assessment and uncertainty quantification of predictive models for smart manufacturing systems , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[32]  B. Venkataramani,et al.  FPGA Implementation of Support Vector Machine Based Isolated Digit Recognition System , 2009, 2009 22nd International Conference on VLSI Design.

[33]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[34]  Davide Anguita,et al.  Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates , 2015, Neural Processing Letters.

[35]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[36]  Maria Schuld,et al.  Quantum Computing for Pattern Classification , 2014, PRICAI.

[37]  D. Deutsch Quantum theory, the Church–Turing principle and the universal quantum computer , 1985, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[38]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[39]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  S. Lloyd,et al.  Quantum algorithms for supervised and unsupervised machine learning , 2013, 1307.0411.

[42]  Davide Anguita,et al.  A support vector machine with integer parameters , 2008, Neurocomputing.

[43]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[44]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[45]  Bernard Widrow,et al.  New Trends of Learning in Computational Intelligence [Guest Editorial] , 2015, IEEE Comput. Intell. Mag..

[46]  Davide Anguita,et al.  Quantum optimization for training support vector machines , 2003, Neural Networks.

[47]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[48]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[49]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[50]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[51]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[52]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[53]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[54]  Arthur O. Pittenger,et al.  An Introduction to Quantum Computing Algorithms , 2000 .

[55]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[56]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[57]  Davide Anguita,et al.  Learning Resource-Aware Classifiers for Mobile Devices: From Regularization to Energy Efficiency , 2015, Neurocomputing.

[58]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[59]  Raquel Valdés-Cristerna,et al.  An FPGA Implementation of Linear Kernel Support Vector Machines , 2006, 2006 IEEE International Conference on Reconfigurable Computing and FPGA's (ReConFig 2006).

[60]  Davide Anguita,et al.  Tikhonov, Ivanov and Morozov regularization for support vector machine learning , 2015, Machine Learning.

[61]  Lov K. Grover A fast quantum mechanical algorithm for database search , 1996, STOC '96.

[62]  M. Opper,et al.  On the ability of the optimal perceptron to generalise , 1990 .

[63]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[64]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[65]  Giuseppe Castagnoli,et al.  Highlighting the Mechanism of the Quantum Speedup by Time-Symmetric and Relational Quantum Mechanics , 2013, 1308.5077.

[66]  Tan Yue-jin Heuristic Algorithm for Tuning Hyperparameters in Support Vector Regression , 2007 .

[67]  Davide Anguita,et al.  K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines , 2009, DMIN.

[68]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[69]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[70]  Itsuo Takanami,et al.  A fault-value injection approach for multiple-weight-fault tolerance of MNNs , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[71]  Purnamrita Sarkar,et al.  The Big Data Bootstrap , 2012, ICML.

[72]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[73]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[74]  Giuseppe De Nicolao,et al.  On the Representer Theorem and Equivalent Degrees of Freedom of SVR , 2007, J. Mach. Learn. Res..

[75]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[76]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[77]  Robert R. Tucci Quantum Circuit for Calculating Mean Values Via Grover-like Algorithm , 2014 .

[78]  Davide Anguita,et al.  The 'K' in K-fold Cross Validation , 2012, ESANN.

[79]  Davide Anguita,et al.  A digital architecture for support vector machines: theory, algorithm, and FPGA implementation , 2003, IEEE Trans. Neural Networks.

[80]  Przemyslaw Klesk,et al.  Sets of approximating functions with finite Vapnik-Chervonenkis dimension for nearest-neighbors algorithms , 2011, Pattern Recognit. Lett..

[81]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[82]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[83]  Davide Anguita,et al.  A support vector machine classifier from a bit-constrained, sparse and localized hypothesis space , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[84]  John Langford,et al.  Computable Shell Decomposition Bounds , 2000, J. Mach. Learn. Res..

[85]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[86]  Knight,et al.  Realistic lower bounds for the factorization time of large numbers on a quantum computer. , 1996, Physical review. A, Atomic, molecular, and optical physics.

[87]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[88]  Lov K. Grover,et al.  Quantum computation , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[89]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[90]  Marcos M. Campos,et al.  SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[91]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[92]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[93]  Jie Li,et al.  Training robust support vector machine with smooth Ramp loss in the primal space , 2008, Neurocomputing.

[94]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[95]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[96]  Narayanan Vijaykrishnan,et al.  A Hardware Efficient Support Vector Machine Architecture for FPGA , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[97]  Davide Anguita,et al.  Fully Empirical and Data-Dependent Stability-Based Bounds , 2015, IEEE Transactions on Cybernetics.

[98]  David A. McAllester Some PAC-Bayesian theorems , 1998, COLT' 98.

[99]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[100]  A. Ghio,et al.  A Support Vector Machine based pedestrian recognition system on resource-limited hardware architectures , 2007, 2007 Ph.D Research in Microelectronics and Electronics Conference.

[101]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[102]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[103]  Shiliang Sun,et al.  A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[104]  Davide Anguita,et al.  Training support vector machines: a quantum-computing perspective , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[105]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[106]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[107]  Michael G. Epitropakis,et al.  Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms , 2010, Appl. Soft Comput..

[108]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[109]  Davide Anguita,et al.  A Novel Procedure for Training L1-L2 Support Vector Machine Classifiers , 2013, ICANN.

[110]  Lawrence O. Hall,et al.  Bit reduction support vector machine , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[111]  Sandro Ridella,et al.  Prospects of quantum-classical optimization for digital design , 2006, Appl. Math. Comput..

[112]  Gilles Brassard,et al.  An optimal quantum algorithm to approximate the mean and its application for approximating the median of a set of points over an arbitrary distance , 2011, ArXiv.

[113]  Andrew W. Cross,et al.  Demonstration of a quantum error detection code using a square lattice of four superconducting qubits , 2015, Nature Communications.

[114]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[115]  Davide Anguita,et al.  Local Rademacher Complexity: Sharper risk bounds with and without unlabeled samples , 2015, Neural Networks.

[116]  Andrew Chi-Sing Leung,et al.  Convergence Analyses on On-Line Weight Noise Injection-Based Training Algorithms for MLPs , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[117]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[118]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[119]  Todd A. Brun,et al.  Quantum Computing , 2011, Computer Science, The Hardware, Software and Heart of It.

[120]  M. Younsi Proof of a Combinatorial Conjecture Coming from the PAC-Bayesian Machine Learning Theory , 2012, 1209.0824.

[121]  Lov K. Grover From Schrödinger’s equation to the quantum search algorithm , 2001, quant-ph/0109116.

[122]  Lov K. Grover Quantum Mechanics Helps in Searching for a Needle in a Haystack , 1997, quant-ph/9706033.

[123]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[124]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[125]  Haixun Wang,et al.  Guest Editorial: Big Social Data Analysis , 2014, Knowl. Based Syst..

[126]  François Laviolette,et al.  PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[127]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[128]  Davide Anguita,et al.  Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf , 2015, INNS Conference on Big Data.

[129]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[130]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[131]  Carlo Vercellis,et al.  Discrete support vector decision trees via tabu search , 2004, Comput. Stat. Data Anal..

[132]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[133]  Davide Anguita,et al.  Learning Hardware-Friendly Classifiers Through Algorithmic Stability , 2016, ACM Trans. Embed. Comput. Syst..

[134]  I. Chuang,et al.  Quantum Computation and Quantum Information: Bibliography , 2010 .

[135]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[136]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[137]  M. Opper,et al.  Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[138]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[139]  Davide Anguita,et al.  Mixing floating- and fixed-point formats for neural network learning on neuroprocessors , 1996, Microprocess. Microprogramming.

[140]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[141]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[142]  Markus Höhfeld,et al.  Probabilistic rounding in neural network learning with limited precision , 1992, Neurocomputing.

[143]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[144]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[145]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[146]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[147]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[148]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[149]  Davide Anguita,et al.  Model Selection for Big Data: Algorithmic Stability and Bag of Little Bootstraps on GPUs , 2015, ESANN.

[150]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[151]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[152]  Pedram Khalili Amiri,et al.  Quantum computers , 2003 .

[153]  Isabelle Guyon,et al.  Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[154]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[155]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[156]  Scott Aaronson,et al.  Quantum Computing since Democritus , 2013 .

[157]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[158]  Giuseppe Castagnoli,et al.  Theory of the quantum speed-up , 2001, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[159]  Lov K. Grover A framework for fast quantum mechanical algorithms , 1997, STOC '98.

[160]  Tad Hogg,et al.  Quantum optimization , 2000, Inf. Sci..

[161]  Giuseppe Castagnoli,et al.  The 50% Advanced Information Rule of the Quantum Algorithms , 2009, 0904.4209.

[162]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[163]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[164]  Cesare Alippi A probably approximately correct framework to estimate performancedegradation in embedded systems , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[165]  Christoph Dürr,et al.  A Quantum Algorithm for Finding the Minimum , 1996, ArXiv.

[166]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[167]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[168]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[169]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.