Model adaptation is important for the analysis of audio-visual data from body worn cameras in order to cope with rapidly changing scene conditions, varying object appearance and limited training data. In this paper, we propose a new approach for the on-line and unsupervised adaptation of deep-learning models for audio-visual target re-identification. Specifically, we adapt each mono-modal model using the unsupervised labelling provided by the other modality. To limit the detrimental effects of erroneous labels, we use a regularisation term based on the Kullback-Leibler divergence between the initial model and the one being adapted. The proposed adaptation strategy complements common audio-visual late fusion approaches and is beneficial also when one modality is no longer reliable. We show the contribution of the proposed strategy in improving the overall re-identification performance on a challenging public dataset captured with body worn cameras.

Unsupervised cross-modal deep-model adaptation for audio-visual re-identification with wearable cameras

Brutti, Alessio;
2017-01-01

Abstract

Model adaptation is important for the analysis of audio-visual data from body worn cameras in order to cope with rapidly changing scene conditions, varying object appearance and limited training data. In this paper, we propose a new approach for the on-line and unsupervised adaptation of deep-learning models for audio-visual target re-identification. Specifically, we adapt each mono-modal model using the unsupervised labelling provided by the other modality. To limit the detrimental effects of erroneous labels, we use a regularisation term based on the Kullback-Leibler divergence between the initial model and the one being adapted. The proposed adaptation strategy complements common audio-visual late fusion approaches and is beneficial also when one modality is no longer reliable. We show the contribution of the proposed strategy in improving the overall re-identification performance on a challenging public dataset captured with body worn cameras.
File in questo prodotto:
File Dimensione Formato  
Brutti_Unsupervised_Cross-Modal_Deep-Model_ICCV_2017_paper.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Dominio pubblico
Dimensione 666.64 kB
Formato Adobe PDF
666.64 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/313180
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact