Growing needs in translating multimedia content have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling. Contrary to text translation, subtitling is subject to spatial and temporal constraints, which greatly increase the post-processing effort required to restore the NMT output to a proper subtitle format. In this work, we explore whether existing subtitling corpora conform to the constraints of: 1) length and reading speed; and 2) proper line breaks. We show that the process of creating parallel sentence alignments removes important time and line break information and propose practices for creating resources for subtitling-oriented NMT faithful to the subtitle format.

Are Subtitling Corpora really Subtitle-like?

Alina Karakanta;Matteo Negri;Marco Turchi
2019-01-01

Abstract

Growing needs in translating multimedia content have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling. Contrary to text translation, subtitling is subject to spatial and temporal constraints, which greatly increase the post-processing effort required to restore the NMT output to a proper subtitle format. In this work, we explore whether existing subtitling corpora conform to the constraints of: 1) length and reading speed; and 2) proper line breaks. We show that the process of creating parallel sentence alignments removes important time and line break information and propose practices for creating resources for subtitling-oriented NMT faithful to the subtitle format.
File in questo prodotto:
File Dimensione Formato  
paper38.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 324.76 kB
Formato Adobe PDF
324.76 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/321898
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact