We evaluate state-of-the-art multimodal models to detect common olfactory references in multilingual text and images in the scope of the Multimodal Understanding of Smells in Texts and Images (MUSTI) at Mediaeval’22. The goal of the MUSTI Subtask 1 is to classify paired text and images as to whether they refer to the same smell source or not. We approach this task as a Visual Entailment problem and evaluate the performance of the English model ViLBERT and the multilingual model mUNITER on MUSTI Subtask 1. Although base VilBERT and mUNITER models perform worse than a dummy baseline, fine-tuning these models improve performance significantly in almost all scenarios. We find that fine-tuning mUNITER with SNLI-VE and MUSTI train data performs better than other configurations we implemented. Our experiments demonstrate that the task presents some challenges, but it is by no means impossible. Our code is available on https://github. com/Odeuropa/musti-eval-baselines.
Multimodal and Multilingual Understanding of Smells using VilBERT and mUNITER
Teresa Paccosi;Stefano Menini;
2022-01-01
Abstract
We evaluate state-of-the-art multimodal models to detect common olfactory references in multilingual text and images in the scope of the Multimodal Understanding of Smells in Texts and Images (MUSTI) at Mediaeval’22. The goal of the MUSTI Subtask 1 is to classify paired text and images as to whether they refer to the same smell source or not. We approach this task as a Visual Entailment problem and evaluate the performance of the English model ViLBERT and the multilingual model mUNITER on MUSTI Subtask 1. Although base VilBERT and mUNITER models perform worse than a dummy baseline, fine-tuning these models improve performance significantly in almost all scenarios. We find that fine-tuning mUNITER with SNLI-VE and MUSTI train data performs better than other configurations we implemented. Our experiments demonstrate that the task presents some challenges, but it is by no means impossible. Our code is available on https://github. com/Odeuropa/musti-eval-baselines.File | Dimensione | Formato | |
---|---|---|---|
paper6505.pdf
accesso aperto
Descrizione: Multimodal and Multilingual Understanding of Smells using VilBERT and mUNITER
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
600.11 kB
Formato
Adobe PDF
|
600.11 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.