IRIS Institutional Research Information System

This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.

Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos

Anna Kuznetsova;Carlo Strapparava

2024-01-01

Abstract

This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Codice ISBN
	
				978-2-493814-10-4
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2024.lrec-main.1037.pdf solo utenti autorizzati Licenza: Copyright dell'editore Dimensione 2.16 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.16 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/357307

Citazioni

ND

social impact