The proposed prototype consists of a voice-controlled floor lamp embedding microphone array processing and robust speech recognition for distant-speech interaction. The system is entirely contained insiede the lamp shade and operates in real-time, “alwayslistening” mode. It runs on a small, low power, fanless board based on an ARM Cortex-A9 processor. Audio is captured through eight digital MEMS microphones arranged in an arc of a circumference on the lamp shade. Software modules are organized in the following way: Time Delay Estimation based on Generalized Cross Correlation with Phase Transform, Delay and Sum Beamforming, Voice Activity Detection based on joint use of energy and pitch, HMM Automatic Speech Recognition based on cross-word context-dependent GMM acoustic models properly trained on contaminated signals. The interaction is triggered by the detection of a command as if it were a keyword. Because of the limited computational power of an ARM based Single Board Computer, a major difficulty lies in being able to provide a real-time operating in a wide range of acoustic situations. This approach explores the way for the integration of speech technology in everyday devices, as alternative to a local-server based solution, or to a wireless-based, distribuited system running in the cloud.

Sviluppo di un sistema embedded di distant speech recognition

Sosi, Alessandro;Brugnara, Fabio;Matassoni, Marco;Omologo, Maurizio;Ravanelli, Mirco
2013-01-01

Abstract

The proposed prototype consists of a voice-controlled floor lamp embedding microphone array processing and robust speech recognition for distant-speech interaction. The system is entirely contained insiede the lamp shade and operates in real-time, “alwayslistening” mode. It runs on a small, low power, fanless board based on an ARM Cortex-A9 processor. Audio is captured through eight digital MEMS microphones arranged in an arc of a circumference on the lamp shade. Software modules are organized in the following way: Time Delay Estimation based on Generalized Cross Correlation with Phase Transform, Delay and Sum Beamforming, Voice Activity Detection based on joint use of energy and pitch, HMM Automatic Speech Recognition based on cross-word context-dependent GMM acoustic models properly trained on contaminated signals. The interaction is triggered by the detection of a command as if it were a keyword. Because of the limited computational power of an ARM based Single Board Computer, a major difficulty lies in being able to provide a real-time operating in a wide range of acoustic situations. This approach explores the way for the integration of speech technology in everyday devices, as alternative to a local-server based solution, or to a wireless-based, distribuited system running in the cloud.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/179410
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact