A user-friendly interface is being investigated for the access to a virtual smart assistant enabling the interaction with TV-related digital devices and infotainment services. In the given scenario, the users can speak in a natural and comfortable way, not encumbered by any hand-held or head-mounted microphone. The environment is typically a living room, equipped with digital TV, Hi-Fi audio devices, etc., and populated by a group of people (e.g., family members). Among the most challenging issues involved in this scenario are a multi-microphone front-end for an effective processing of the given acoustic scene, an Acoustic Echo Cancellation (AEC) component to compensate the sound produced by loudspeakers, and eventually a multi-modal distant-talking spoken dialogue system. As far as the front-end is concerned, multiple speaker localization, speech activity detection, speaker identification, and speech recognition will have to perform accurately even when AEC is applied to the given microphone array. The paper aims to present preliminary results of this research, which is being conducted under the European Project DICIT.

Front-end processing of a distant-talking speech interface for control of an interactive TV system

Omologo, Maurizio
2008-01-01

Abstract

A user-friendly interface is being investigated for the access to a virtual smart assistant enabling the interaction with TV-related digital devices and infotainment services. In the given scenario, the users can speak in a natural and comfortable way, not encumbered by any hand-held or head-mounted microphone. The environment is typically a living room, equipped with digital TV, Hi-Fi audio devices, etc., and populated by a group of people (e.g., family members). Among the most challenging issues involved in this scenario are a multi-microphone front-end for an effective processing of the given acoustic scene, an Acoustic Echo Cancellation (AEC) component to compensate the sound produced by loudspeakers, and eventually a multi-modal distant-talking spoken dialogue system. As far as the front-end is concerned, multiple speaker localization, speech activity detection, speaker identification, and speech recognition will have to perform accurately even when AEC is applied to the given microphone array. The paper aims to present preliminary results of this research, which is being conducted under the European Project DICIT.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/9002
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact