Ranking Loss: Maximizing the Success Rate in Deep Learning Side-Channel Analysis

The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Compared to template attacks, deep learning techniques can deal with protected implementations, such as masking or desynchronization, without substantial preprocessing. However, important issues are still open. One challenging problem is to adapt the methods classically used in the machine learning field (e.g. loss function, performance metrics) to the specific side-channel context in order to obtain optimal results. We propose a new loss function derived from the learning to rank approach that helps preventing approximation and estimation errors, induced by the classical cross-entropy loss. We theoretically demonstrate that this new function, called Ranking Loss (RkL), maximizes the success rate by minimizing the ranking error of the secret key in comparison with all other hypotheses. The resulting model converges towards the optimal distinguisher when considering the mutual information between the secret and the leakage. Consequently, the approximation error is prevented. Furthermore, the estimation error, induced by the cross-entropy, is reduced by up to 23%. When the ranking loss is used, the convergence towards the best solution is up to 23% faster than a model using the cross-entropy loss function. We validate our theoretical propositions on public datasets.

[1]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Sebastian Bruch,et al.  Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks , 2019, SIGIR.

[4]  Siva Sai Yerubandi,et al.  Differential Power Analysis , 2002 .

[5]  Denis Flandre,et al.  A Formal Study of Power Variability Issues and Side-Channel Attacks for Nanoscale Devices , 2011, EUROCRYPT.

[6]  Kerstin Lemke-Rust,et al.  Efficient Template Attacks Based on Probabilistic Multi-class Support Vector Machines , 2012, CARDIS.

[7]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[8]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[9]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[10]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[11]  Emmanuel Prouff,et al.  Deep learning for side-channel analysis and introduction to ASCAD database , 2019, Journal of Cryptographic Engineering.

[12]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[13]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[14]  Lilian Bossuet,et al.  Methodology for Efficient CNN Architectures in Profiling Attacks , 2019, IACR Cryptol. ePrint Arch..

[15]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[16]  Cécile Canovas,et al.  A Comprehensive Study of Deep Learning for Side-Channel Analysis , 2019, IACR Cryptol. ePrint Arch..

[17]  Werner Schindler,et al.  How to Compare Profiled Side-Channel Attacks? , 2009, ACNS.

[18]  Nenghai Yu,et al.  A Novel Evaluation Metric for Deep Learning-Based Side Channel Analysis and Its Extended Application to Imbalanced Data , 2020, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[19]  Cécile Canovas,et al.  Deep Learning to Evaluate Secure RSA Implementations , 2019, IACR Cryptol. ePrint Arch..

[20]  Joos Vandewalle,et al.  Machine learning in side-channel analysis: a first study , 2011, Journal of Cryptographic Engineering.

[21]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[22]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[23]  Moti Yung,et al.  A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks (extended version) , 2009, IACR Cryptol. ePrint Arch..

[24]  Simon Regard,et al.  ["Less is more"]. , 2013, Revue medicale suisse.

[25]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[26]  Dakshi Agrawal,et al.  The EM Side-Channel(s) , 2002, CHES.

[27]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[28]  Sylvain Guilley,et al.  Optimal side-channel attacks for multivariate leakages and multiple models , 2016, Journal of Cryptographic Engineering.

[29]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[30]  Emmanuel Prouff,et al.  Breaking Cryptographic Implementations Using Deep Learning Techniques , 2016, SPACE.

[31]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[32]  Stefan Mangard,et al.  Power analysis attacks - revealing the secrets of smart cards , 2007 .

[33]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[34]  Pankaj Rohatgi,et al.  Template Attacks , 2002, CHES.

[35]  Sylvain Guilley,et al.  Profiling Side-channel Analysis in the Restricted Attacker Framework , 2019, IACR Cryptol. ePrint Arch..

[36]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[37]  Annelie Heuser,et al.  The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations , 2018, IACR Cryptol. ePrint Arch..

[38]  H. Robbins A Stochastic Approximation Method , 1951 .

[39]  Sylvain Guilley,et al.  Best Information is Most Successful , 2019, IACR Cryptol. ePrint Arch..

[40]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[41]  Cécile Canovas,et al.  Gradient Visualization for General Characterization in Profiling Attacks , 2019, IACR Cryptol. ePrint Arch..

[42]  François Durvaux,et al.  How to Certify the Leakage of a Chip? , 2014, IACR Cryptol. ePrint Arch..

[43]  Hendrik M Wendland,et al.  When good is not good enough , 2018, Maastricht Journal of European and Comparative Law.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Cécile Canovas,et al.  Convolutional Neural Networks with Data Augmentation Against Jitter-Based Countermeasures - Profiling Attacks Without Pre-processing , 2017, CHES.

[46]  Olivier Markowitch,et al.  Power analysis attack: an approach based on machine learning , 2014, Int. J. Appl. Cryptogr..

[47]  Eric Peeters,et al.  On the masking countermeasure and higher-order power analysis attacks , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[48]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[49]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.

[50]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[51]  François-Xavier Standaert,et al.  Leakage Certification Revisited: Bounding Model Errors in Side-Channel Security Evaluations , 2019, IACR Cryptol. ePrint Arch..

[52]  Claude Carlet,et al.  Stochastic Collision Attack , 2017, IEEE Transactions on Information Forensics and Security.