We introduce a probabilistic dynamic quantization method for neural networks that combines the adaptability of dynamic quantization with the low memory footprint of static approaches. Our technique uses a surrogate probabilistic model of pre-activation statistics to estimate quantization parameters before layer execution, enabling input-adaptive quantization without storing intermediate activations. This design reduces the working memory overhead of conventional dynamic quantization while retaining robustness to distribution shifts. Our technique is evaluated across a diverse set of vision tasks and architectures, retaining accuracy comparable to dynamic quantization while operating at the memory cost of static quantization. It achieves an optimal balance between accuracy and computational cost when compared to conventional quantization strategies.
Probabilistic dynamic quantization for memory constrained devices
Santini Gabriele;Paissan Francesco;Farella Elisabetta
2025-01-01
Abstract
We introduce a probabilistic dynamic quantization method for neural networks that combines the adaptability of dynamic quantization with the low memory footprint of static approaches. Our technique uses a surrogate probabilistic model of pre-activation statistics to estimate quantization parameters before layer execution, enabling input-adaptive quantization without storing intermediate activations. This design reduces the working memory overhead of conventional dynamic quantization while retaining robustness to distribution shifts. Our technique is evaluated across a diverse set of vision tasks and architectures, retaining accuracy comparable to dynamic quantization while operating at the memory cost of static quantization. It achieves an optimal balance between accuracy and computational cost when compared to conventional quantization strategies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
