IRIS Institutional Research Information System

We present the highly multilingual news analysis system Europe Media Monitor (EMM), which gathers an average of 175,000 online news articles per day in tens of languages, categorises the news items and extracts named entities and various other information from them. We also give an overview of EMM’s text mining tool set, focusing on the issue of how the software deals with highly inflected languages such as those of the Slavic and Finno-Ugric language families. The questions we ask are: How to adapt extraction patterns to such languages? How to de-inflect extracted named entities? And: Will document categorisation benefit from lemmatising the texts?

Multilingual Media Monitoring and Text Analysis – Challenges for Highly Inflected Languages

Ralf Steinberger;Maud Ehrmann;Júlia Pajzs;Mohamed Ebrahim;Josef Steinberger;Turchi, Marco

2013-01-01

Abstract

We present the highly multilingual news analysis system Europe Media Monitor (EMM), which gathers an average of 175,000 online news articles per day in tens of languages, categorises the news items and extracts named entities and various other information from them. We also give an overview of EMM’s text mining tool set, focusing on the issue of how the software deals with highly inflected languages such as those of the Slavic and Finno-Ugric language families. The questions we ask are: How to adapt extraction patterns to such languages? How to de-inflect extracted named entities? And: Will document categorisation benefit from lemmatising the texts?

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2013
		
	Codice ISBN
	
			9783642405846
9783642405853
		
	Appare nelle tipologie:
	
			4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/223017

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

social impact