Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech pro- cessing algorithms. In this application scenario, a crucial task is the detection and localization of speech events generated by users within the various rooms. A specific challenge of multi-room environments is the inter-room interference that negatively affects speech activity detectors. In this paper, we present and compare different solutions for the multi-room speech activity detection task. The combina- tion of a model-based room-independent speech activity detection module with a room-dependent inside/outside classification stage, based on specific features, provides satisfactory performance. The proposed methods are evaluated on a multi-room, multi-channel corpus, where spoken commands and other typical acoustic events occur in different rooms.
Multi-room speech activity detection using a distributed microphone network in domestic environments
Brutti, Alessio;Matassoni, Marco;
2015-01-01
Abstract
Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech pro- cessing algorithms. In this application scenario, a crucial task is the detection and localization of speech events generated by users within the various rooms. A specific challenge of multi-room environments is the inter-room interference that negatively affects speech activity detectors. In this paper, we present and compare different solutions for the multi-room speech activity detection task. The combina- tion of a model-based room-independent speech activity detection module with a room-dependent inside/outside classification stage, based on specific features, provides satisfactory performance. The proposed methods are evaluated on a multi-room, multi-channel corpus, where spoken commands and other typical acoustic events occur in different rooms.File | Dimensione | Formato | |
---|---|---|---|
1570103509.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
587.5 kB
Formato
Adobe PDF
|
587.5 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.