We address the problem of 3D audio-visual person tracking using a compact platform with co-located audio-visual sensors, without a depth camera. We present a face detection driven approach supported by 3D hypothesis mapping to image plane for visual feature matching. We then propose a video-assisted audio likelihood computation, which relies on a GCC-PHAT based acoustic map. Audio and video likelihoods are fused together in a particle filtering framework. The proposed approach copes with a reverberant and noisy environment, and can deal with person being occluded, outside the camera’s Field of View (FoV), as well as not facing or far from the sensing platform. Experimental results show that we can provide accurate person tracking in both 3D and on image
3D mouth tracking from a compact microphone array co-located with a camera
Alessio Brutti;Oswald Lanz;Maurizio Omologo;
2018-01-01
Abstract
We address the problem of 3D audio-visual person tracking using a compact platform with co-located audio-visual sensors, without a depth camera. We present a face detection driven approach supported by 3D hypothesis mapping to image plane for visual feature matching. We then propose a video-assisted audio likelihood computation, which relies on a GCC-PHAT based acoustic map. Audio and video likelihoods are fused together in a particle filtering framework. The proposed approach copes with a reverberant and noisy environment, and can deal with person being occluded, outside the camera’s Field of View (FoV), as well as not facing or far from the sensing platform. Experimental results show that we can provide accurate person tracking in both 3D and on imageFile | Dimensione | Formato | |
---|---|---|---|
0003071.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.1 MB
Formato
Adobe PDF
|
1.1 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.