Learning Hardware-Friendly Classifiers Through Algorithmic Stability

Most state-of-the-art machine-learning (ML) algorithms do not consider the computational constraints of implementing the learned model on embedded devices. These constraints are, for example, the limited depth of the arithmetic unit, the memory availability, or the battery capacity. We propose a new learning framework, the Algorithmic Risk Minimization (ARM), which relies on Algorithmic-Stability, and includes these constraints inside the learning process itself. ARM allows one to train advanced resource-sparing ML models and to efficiently deploy them on smart embedded systems. Finally, we show the advantages of our proposal on a smartphone-based Human Activity Recognition application by comparing it to a conventional ML approach.

[1]  O. Khan,et al.  ACM Transactions on Embedded Computing Systems continued on back cover , 2018 .

[2]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[3]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[4]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[5]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[6]  Enrique Alba,et al.  Using Variable Neighborhood Search to improve the Support Vector Machine performance in embedded automotive applications , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  M. Opper,et al.  Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[8]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[9]  Miodrag Potkonjak,et al.  Behavior-oriented data resource management in medical sensing systems , 2013, TOSN.

[10]  Davide Anguita,et al.  A support vector machine with integer parameters , 2008, Neurocomputing.

[11]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[12]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[13]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[14]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[15]  Eliathamby Ambikairajah,et al.  Classification of a known sequence of motions and postures from accelerometry data using adapted Gaussian mixture models. , 2006, Physiological measurement.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988, Wiley interscience series in discrete mathematics and optimization.

[18]  John Lach,et al.  Application-Focused Energy-Fidelity Scalability for Wireless Motion-Based Health Assessment , 2012, TECS.

[19]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[22]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[23]  Sang-Woong Lee,et al.  Real-Time Implementation of Face Recognition Algorithms on DSP Chip , 2003, AVBPA.

[24]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[25]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[26]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[27]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[28]  Raquel Valdés-Cristerna,et al.  An FPGA Implementation of Linear Kernel Support Vector Machines , 2006, 2006 IEEE International Conference on Reconfigurable Computing and FPGA's (ReConFig 2006).

[29]  Hao Wang,et al.  Connecting people through physical proximity and physical resources at a conference , 2013, TIST.

[30]  Michael G. Epitropakis,et al.  Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms , 2010, Appl. Soft Comput..

[31]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[32]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[33]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[34]  S. Sathiya Keerthi,et al.  An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels , 2004, IEEE Transactions on Neural Networks.

[35]  Keinosuke Fukunaga,et al.  Leave-One-Out Procedures for Nonparametric Error Estimates , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Urbashi Mitra,et al.  KNOWME: An Energy-Efficient Multimodal Body Area Network for Physical Activity Monitoring , 2012, TECS.

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  Diane J. Cook,et al.  Pervasive computing at scale: Transforming the state of the art , 2012, Pervasive Mob. Comput..

[39]  A. Ghio,et al.  A Support Vector Machine based pedestrian recognition system on resource-limited hardware architectures , 2007, 2007 Ph.D Research in Microelectronics and Electronics Conference.

[40]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[41]  John Shawe-Taylor,et al.  Distribution-Dependent PAC-Bayes Priors , 2010, ALT.

[42]  Manfred Mücke,et al.  Effects of Reduced Precision on Floating-Point SVM Classification Accuracy , 2011, International Conference on Conceptual Structures.

[43]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[44]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[45]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[46]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[47]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[48]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[49]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[50]  B. Venkataramani,et al.  FPGA Implementation of Support Vector Machine Based Isolated Digit Recognition System , 2009, 2009 22nd International Conference on VLSI Design.

[51]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[52]  Ming Tan,et al.  Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Davide Anguita,et al.  A digital architecture for support vector machines: theory, algorithm, and FPGA implementation , 2003, IEEE Trans. Neural Networks.

[54]  Przemyslaw Klesk,et al.  Sets of approximating functions with finite Vapnik-Chervonenkis dimension for nearest-neighbors algorithms , 2011, Pattern Recognit. Lett..

[55]  Luca Benini,et al.  Network-Level Power-Performance Trade-Off in Wearable Activity Recognition: A Dynamic Sensor Selection Approach , 2012, TECS.

[56]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[57]  Davide Anguita,et al.  A support vector machine classifier from a bit-constrained, sparse and localized hypothesis space , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[58]  Davide Anguita,et al.  Smartphone battery saving by bit-based hypothesis spaces and local Rademacher Complexities , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[59]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[60]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[61]  Giuseppe De Nicolao,et al.  On the Representer Theorem and Equivalent Degrees of Freedom of SVR , 2007, J. Mach. Learn. Res..

[62]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[63]  Davide Anguita,et al.  Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic , 2013, J. Univers. Comput. Sci..

[64]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[65]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[66]  M. Opper,et al.  On the ability of the optimal perceptron to generalise , 1990 .

[67]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[68]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[69]  M. Opper Statistical Mechanics of Learning : Generalization , 2002 .

[70]  Lawrence O. Hall,et al.  Bit reduction support vector machine , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[71]  Kalaiarasi Sonai Muthu,et al.  Classification Algorithms in Human Activity Recognition using Smartphones , 2012 .

[72]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[73]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[74]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[75]  Bin Tang,et al.  Energy-efficient data redistribution in sensor networks , 2010, The 7th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE MASS 2010).

[76]  Vassilis P. Plagianakos,et al.  Parallel evolutionary training algorithms for “hardware-friendly” neural networks , 2002, Natural Computing.

[77]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[78]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[79]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[80]  Carlo Vercellis,et al.  Discrete support vector decision trees via tabu search , 2004, Comput. Stat. Data Anal..

[81]  Davide Anguita,et al.  Out-of-Sample Error Estimation: The Blessing of High Dimensionality , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[82]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[83]  Narayanan Vijaykrishnan,et al.  A Hardware Efficient Support Vector Machine Architecture for FPGA , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[84]  Davide Anguita,et al.  Fully Empirical and Data-Dependent Stability-Based Bounds , 2015, IEEE Transactions on Cybernetics.

[85]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[86]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[87]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[88]  Marcos M. Campos,et al.  SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[89]  Amine Bermak,et al.  A Low-Power Hardware-Friendly Binary Decision Tree Classifier for Gas Identification , 2011 .

[90]  Albert Tarantola,et al.  Inverse problem theory - and methods for model parameter estimation , 2004 .

[91]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[92]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[93]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[94]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[95]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[96]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.