We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.
Online Cross-Modal Adaptation for Audio–Visual Person Identification With Wearable Cameras
Brutti, Alessio;
2017-01-01
Abstract
We propose an audio-visual target identification approach for egocentric data with cross-modal model adaptation. The proposed approach blindly and iteratively adapts the time-dependent models of each modality to varying target appearance and environmental conditions using the posterior of the other modality. The adaptation is unsupervised and performed online; thus, models can be improved as new unlabeled data become available. In particular, accurate models do not deteriorate when a modality is underperforming thanks to an appropriate selection of the parameters in the adaptation. Importantly, unlike traditional audio-visual integration methods, the proposed approach is also useful for temporal intervals during which only one modality is available or when different modalities are used for different tasks. We evaluate the proposed method in an end-to-end multimodal person identification application with two challenging real-world datasets and show that the proposed approach successfully adapts models in presence of mild mismatch. We also show that the proposed approach is beneficial to other multimodal score fusion algorithms.File | Dimensione | Formato | |
---|---|---|---|
thms-brutti-2620110-proof.pdf
non disponibili
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
3.92 MB
Formato
Adobe PDF
|
3.92 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.