Aspects of Multilingual News Summarisation

Steinberger, Josef; Steinberger, Ralf; Tanev, Hristo; Zavarella, Vanni; Turchi, Marco

doi:10.4018/978-1-4666-5019-0.ch012

In this chapter, the authors discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of topic-related news articles (multilingual multi-document summarisation), gathered by news aggregation systems. The discussion follows a framework based on Latent Semantic Analysis (LSA) because LSA was shown to be a high-performing method across many different languages. Starting from a sentence-extractive approach, the authors show how domain-specific aspects can be used and how a compression and paraphrasing method can be plugged in. They also discuss the challenging problem of summarisation evaluation in different languages. In particular, the authors describe two approaches: the first uses a parallel corpus and the second statistical machine translation.