IRIS Institutional Research Information System

Keyword Spotting (KWS) is handy in many innovative ambient intelligence applications, such as smart cities and home automation. While solving KWS on GP/GPUs has become a trivial task in recent years, many benefits arise when KWS applications run at the edge (e.g., privacy by design and infrastructure sustainability), where resources are limited. Hardware-aware scaling (HAS) is a novel paradigm that brings neural architectures to low-resource platforms. With HAS, it is possible to optimize neural architectures to fit on embedded platforms (e.g., microcontrollers) while maximizing the performance-complexity tradeoff and the performance-latency tradeoff. This paper shows how HAS, coupled with a neural network with appropriate scaling capabilities, can outperform architectures designed with neural architecture search techniques, such as MCUNet. Our method achieves 94.5% accuracy when classifying the 35 keywords in Google Speech Commands v2, with only 70 ms of latency and overall power consumption of less than 10 mJ.

Improving latency performance trade-off in keyword spotting applications at the edge

Paissan, Francesco;Sahabdeen, Anisha Mohamed;Ancilotto, Alberto;Farella, Elisabetta

2023-01-01

Abstract

Keyword Spotting (KWS) is handy in many innovative ambient intelligence applications, such as smart cities and home automation. While solving KWS on GP/GPUs has become a trivial task in recent years, many benefits arise when KWS applications run at the edge (e.g., privacy by design and infrastructure sustainability), where resources are limited. Hardware-aware scaling (HAS) is a novel paradigm that brings neural architectures to low-resource platforms. With HAS, it is possible to optimize neural architectures to fit on embedded platforms (e.g., microcontrollers) while maximizing the performance-complexity tradeoff and the performance-latency tradeoff. This paper shows how HAS, coupled with a neural network with appropriate scaling capabilities, can outperform architectures designed with neural architecture search techniques, such as MCUNet. Our method achieves 94.5% accuracy when classifying the 35 keywords in Google Speech Commands v2, with only 70 ms of latency and overall power consumption of less than 10 mJ.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN
	
				979-8-3503-3694-8
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/340287

Citazioni

ND

social impact