Keyword Spotting (KWS) is handy in many innovative ambient intelligence applications, such as smart cities and home automation. While solving KWS on GP/GPUs has become a trivial task in recent years, many benefits arise when KWS applications run at the edge (e.g., privacy by design and infrastructure sustainability), where resources are limited. Hardware-aware scaling (HAS) is a novel paradigm that brings neural architectures to low-resource platforms. With HAS, it is possible to optimize neural architectures to fit on embedded platforms (e.g., microcontrollers) while maximizing the performance-complexity tradeoff and the performance-latency tradeoff. This paper shows how HAS, coupled with a neural network with appropriate scaling capabilities, can outperform architectures designed with neural architecture search techniques, such as MCUNet. Our method achieves 94.5% accuracy when classifying the 35 keywords in Google Speech Commands v2, with only 70 ms of latency and overall power consumption of less than 10 mJ.
Improving latency performance trade-off in keyword spotting applications at the edge
Paissan, Francesco;Sahabdeen, Anisha Mohamed;Ancilotto, Alberto;Farella, Elisabetta
2023-01-01
Abstract
Keyword Spotting (KWS) is handy in many innovative ambient intelligence applications, such as smart cities and home automation. While solving KWS on GP/GPUs has become a trivial task in recent years, many benefits arise when KWS applications run at the edge (e.g., privacy by design and infrastructure sustainability), where resources are limited. Hardware-aware scaling (HAS) is a novel paradigm that brings neural architectures to low-resource platforms. With HAS, it is possible to optimize neural architectures to fit on embedded platforms (e.g., microcontrollers) while maximizing the performance-complexity tradeoff and the performance-latency tradeoff. This paper shows how HAS, coupled with a neural network with appropriate scaling capabilities, can outperform architectures designed with neural architecture search techniques, such as MCUNet. Our method achieves 94.5% accuracy when classifying the 35 keywords in Google Speech Commands v2, with only 70 ms of latency and overall power consumption of less than 10 mJ.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.