Dubbing is the art of finding a translation from a source into a target language that can be lip-synchronously revoiced, i. e., that makes the target language speech appear as if it was spoken by the very actors all along. Lip synchrony is essential for the full-fledged reception of foreign audiovisual media, such as movies and series, as violated constraints of synchrony between video (lips) and audio (speech) lead to cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints only apply to the translation when the speaker's lips are visible on screen. Therefore, deciding whether to apply synchrony constraints requires an automatic method for detecting whether an actor's lips are visible on screen for a given stretch of speech or not. In this paper, we attempt, for the first time, to classify on- from off-screen speech based on a corpus of real-world television material that has been annotated word-by-word for the visibility of talking lips on screen. We present classification experiments in which we classify
See me speaking? Differentiating on whether words are spoken on screen or off to optimize machine dubbing
Alina Karakanta;Matteo Negri;Marco Turchi
2020-01-01
Abstract
Dubbing is the art of finding a translation from a source into a target language that can be lip-synchronously revoiced, i. e., that makes the target language speech appear as if it was spoken by the very actors all along. Lip synchrony is essential for the full-fledged reception of foreign audiovisual media, such as movies and series, as violated constraints of synchrony between video (lips) and audio (speech) lead to cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints only apply to the translation when the speaker's lips are visible on screen. Therefore, deciding whether to apply synchrony constraints requires an automatic method for detecting whether an actor's lips are visible on screen for a given stretch of speech or not. In this paper, we attempt, for the first time, to classify on- from off-screen speech based on a corpus of real-world television material that has been annotated word-by-word for the visibility of talking lips on screen. We present classification experiments in which we classifyI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.