论文信息 - Reducing numerical precision preserves classification accuracy in Mondrian Forests

Reducing numerical precision preserves classification accuracy in Mondrian Forests

Mondrian Forests are a powerful data stream classification method, but their large memory footprint makes them ill-suited for low-resource platforms such as connected objects. We explored using reduced-precision floating-point representations to lower memory consumption and evaluated its effect on classification performance. We applied the Mondrian Forest implementation provided by OrpailleCC, a C++ collection of data stream algorithms, to two canonical datasets in human activity recognition: Recofit and Banos et al. Results show that the precision of floating-point values used by tree nodes can be reduced from 64 bits to 8 bits with no significant difference in F1 score. In some cases, reduced precision was shown to improve classification performance, presumably due to its regularization effect. We conclude that numerical precision is a relevant hyperparameter in the Mondrian Forest, and that commonly-used double precision values may not be necessary for optimal performance. Future work will evaluate the generalizability of these findings to other data stream classifiers.

Tristan Glatard | Yohan Chatelain | Gregory Kiar | Martin Khannouz | Marc Vicuna

[1] Dan Morris,et al. RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises , 2014, CHI.

[2] John L. Gustafson,et al. Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[3] Tristan Glatard,et al. A Benchmark of Data Stream Classification for Human Activity Recognition on Connected Objects , 2020, Sensors.

[4] João Bártolo Gomes,et al. Scalable real-time classification of data streams with concept drift , 2017, Future Gener. Comput. Syst..

[5] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[6] Yee Whye Teh,et al. Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[7] Yee Whye Teh,et al. The Mondrian Process , 2008, NIPS.

[8] Héctor Pomares,et al. A benchmark dataset to evaluate sensor displacement in activity recognition , 2012, UbiComp.

[9] Eric Petit,et al. Automatic Exploration of Reduced Floating-Point Representations in Iterative Methods , 2019, Euro-Par.

[10] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[11] Eric Petit,et al. Verificarlo: Checking Floating Point Accuracy through Monte Carlo Arithmetic , 2015, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH).

[12] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[13] Héctor Pomares,et al. Dealing with the Effects of Sensor Displacement in Wearable Activity Recognition , 2014, Sensors.

[14] Bo Li,et al. OrpailleCC: a Library for Data Stream Analysis on Embedded Systems , 2019, J. Open Source Softw..

[15] Janni Yuval,et al. Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions , 2020, Nature Communications.

[16] Krzysztof Rojek,et al. Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU‐based supercomputers , 2019, Concurr. Comput. Pract. Exp..

[17] O. Sarbishei,et al. A Quantitative Comparison of Overlapping and Non-Overlapping Sliding Windows for Human Activity Recognition Using Inertial Sensors , 2019, Sensors.

[18] Geoff Hulten,et al. Mining high-speed data streams , 2000, KDD '00.