This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.
Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos
Anna Kuznetsova;Carlo Strapparava
2024-01-01
Abstract
This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.File | Dimensione | Formato | |
---|---|---|---|
2024.lrec-main.1037.pdf
solo utenti autorizzati
Licenza:
Copyright dell'editore
Dimensione
2.16 MB
Formato
Adobe PDF
|
2.16 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.