On-device training on resource-constrained hardware, such as microcontrollers with limited memory and fixed-function convolutional accelerators, remains an open challenge in embedded computer vision. Standard backpropagation is often impractical due to its high memory requirements and reliance on operations unsupported by typical inference-optimized accelerators. Recent forward-only learning methods, such as Forward-Forward and PEPITA, offer lightweight alternatives by eliminating the backward pass, enabling training on ultra-low-power devices. However, these methods tend to degrade in performance on more complex tasks involving deeper networks and larger output spaces. In this work, we introduce the Contextual Convolution Block, a novel architectural module that enhances the representational capacity of forward-only networks by injecting ground truth class information during training. This allows the network to specialize convolutional kernels for specific classes without relying on gradients or weight transport. We further present an optimized implementation of this block using an im2col-based formulation, enabling efficient training on severely constrained devices. Our method significantly improves the scalability of forward-only training approaches, achieving stronger performance on complex classification tasks while preserving compatibility with embedded hardware limitations.
Contextual Convolutions for Scalable Forward-Only Learning on Tiny Devices
Abbassi Mehdi;Ancilotto Alberto;Farella Elisabetta
2025-01-01
Abstract
On-device training on resource-constrained hardware, such as microcontrollers with limited memory and fixed-function convolutional accelerators, remains an open challenge in embedded computer vision. Standard backpropagation is often impractical due to its high memory requirements and reliance on operations unsupported by typical inference-optimized accelerators. Recent forward-only learning methods, such as Forward-Forward and PEPITA, offer lightweight alternatives by eliminating the backward pass, enabling training on ultra-low-power devices. However, these methods tend to degrade in performance on more complex tasks involving deeper networks and larger output spaces. In this work, we introduce the Contextual Convolution Block, a novel architectural module that enhances the representational capacity of forward-only networks by injecting ground truth class information during training. This allows the network to specialize convolutional kernels for specific classes without relying on gradients or weight transport. We further present an optimized implementation of this block using an im2col-based formulation, enabling efficient training on severely constrained devices. Our method significantly improves the scalability of forward-only training approaches, achieving stronger performance on complex classification tasks while preserving compatibility with embedded hardware limitations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
