Dubbing is the art of finding a translation from a source into a target language that can be lip-synchronously revoiced, i. e., that makes the target language speech appear as if it was spoken by the very actors all along. Lip synchrony is essential for the full-fledged reception of foreign audiovisual media, such as movies and series, as violated constraints of synchrony between video (lips) and audio (speech) lead to cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints only apply to the translation when the speaker's lips are visible on screen. Therefore, deciding whether to apply synchrony constraints requires an automatic method for detecting whether an actor's lips are visible on screen for a given stretch of speech or not. In this paper, we attempt, for the first time, to classify on- from off-screen speech based on a corpus of real-world television material that has been annotated word-by-word for the visibility of talking lips on screen. We present classification experiments in which we classify

See me speaking? Differentiating on whether words are spoken on screen or off to optimize machine dubbing

Alina Karakanta;Matteo Negri;Marco Turchi
2020-01-01

Abstract

Dubbing is the art of finding a translation from a source into a target language that can be lip-synchronously revoiced, i. e., that makes the target language speech appear as if it was spoken by the very actors all along. Lip synchrony is essential for the full-fledged reception of foreign audiovisual media, such as movies and series, as violated constraints of synchrony between video (lips) and audio (speech) lead to cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints only apply to the translation when the speaker's lips are visible on screen. Therefore, deciding whether to apply synchrony constraints requires an automatic method for detecting whether an actor's lips are visible on screen for a given stretch of speech or not. In this paper, we attempt, for the first time, to classify on- from off-screen speech based on a corpus of real-world television material that has been annotated word-by-word for the visibility of talking lips on screen. We present classification experiments in which we classify
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/325885
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact