Front-end processing of a distant-talking speech interface for control of an interactive TV system

Omologo, Maurizio

A user-friendly interface is being investigated for the access to a virtual smart assistant enabling the interaction with TV-related digital devices and infotainment services. In the given scenario, the users can speak in a natural and comfortable way, not encumbered by any hand-held or head-mounted microphone. The environment is typically a living room, equipped with digital TV, Hi-Fi audio devices, etc., and populated by a group of people (e.g., family members). Among the most challenging issues involved in this scenario are a multi-microphone front-end for an effective processing of the given acoustic scene, an Acoustic Echo Cancellation (AEC) component to compensate the sound produced by loudspeakers, and eventually a multi-modal distant-talking spoken dialogue system. As far as the front-end is concerned, multiple speaker localization, speech activity detection, speaker identification, and speech recognition will have to perform accurately even when AEC is applied to the given microphone array. The paper aims to present preliminary results of this research, which is being conducted under the European Project DICIT.