With many devices deployed at the extreme edge in dynamic environments the ability to learn continually on the device is a fast-emerging trend for ultra-low-power Microcontrollers (MCUs). The key challenge in enabling Con- tinual Learning (CL) on highly constrained MCUs is to curtail memory and computational requirements. This paper proposes a novel CL strategy based on sparse weight updates coupled with Latent Replay. We reduce the latency and memory requirements of the backpropagation algorithm by computing structured sparse update tensors for the trainable parameters retaining only partial activations during the forward pass and limiting the per-layer gradient computation to a subset of channels. When applied to lightweight Deep Neural Network (DNN) models for image classification namely PhiNet and MobileNetV2 our method can reduce up to 1.3x the memory and computation costs of the backpropagation algorithm with a minor accuracy drop (2%). Furthermore we evaluate the accuracy-latency-memory trade-off targeting a class-incremental CL setup on a RISC-V multi-core MCU. The proposed approach allows to learn on-device a new class-incremental task composed of two unseen classes in 18 min with 4.63 MB considering the most demanding configuration i.e. a MobileNetV2 trained on the CORe50 dataset.
Structured Sparse Back-propagation for Lightweight On-Device Continual Learning on Microcontroller Units
Francesco Paissan;Manuele Rusci;Alberto Ancilotto;Elisabetta Farella
2024-01-01
Abstract
With many devices deployed at the extreme edge in dynamic environments the ability to learn continually on the device is a fast-emerging trend for ultra-low-power Microcontrollers (MCUs). The key challenge in enabling Con- tinual Learning (CL) on highly constrained MCUs is to curtail memory and computational requirements. This paper proposes a novel CL strategy based on sparse weight updates coupled with Latent Replay. We reduce the latency and memory requirements of the backpropagation algorithm by computing structured sparse update tensors for the trainable parameters retaining only partial activations during the forward pass and limiting the per-layer gradient computation to a subset of channels. When applied to lightweight Deep Neural Network (DNN) models for image classification namely PhiNet and MobileNetV2 our method can reduce up to 1.3x the memory and computation costs of the backpropagation algorithm with a minor accuracy drop (2%). Furthermore we evaluate the accuracy-latency-memory trade-off targeting a class-incremental CL setup on a RISC-V multi-core MCU. The proposed approach allows to learn on-device a new class-incremental task composed of two unseen classes in 18 min with 4.63 MB considering the most demanding configuration i.e. a MobileNetV2 trained on the CORe50 dataset.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.