This paper reports on experiments in the creation of a bi-lingual Textual Entailment corpus, using non-experts’ workforce under strict cost and time limitations ($100, 10 days). To this aim workers have been hired for translation and validation tasks, through the CrowdFlower channel to Amazon Mechanical Turk. As a result, an accurate and reliable corpus of 426 English/Spanish entailment pairs has been produced in a more cost-effective way compared to other methods for the acquisition of translations based on crowdsourcing. Focusing on two orthogonal dimensions (i.e. reliability of annotations made by non experts, and overall corpus creation costs), we summarize the methodology we adopted, the achieved results, the main problems encountered, and the lessons learned.

Creating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: $100 for a 10-day Rush

Negri, Matteo;Mehdad, Yashar
2010

Abstract

This paper reports on experiments in the creation of a bi-lingual Textual Entailment corpus, using non-experts’ workforce under strict cost and time limitations ($100, 10 days). To this aim workers have been hired for translation and validation tasks, through the CrowdFlower channel to Amazon Mechanical Turk. As a result, an accurate and reliable corpus of 426 English/Spanish entailment pairs has been produced in a more cost-effective way compared to other methods for the acquisition of translations based on crowdsourcing. Focusing on two orthogonal dimensions (i.e. reliability of annotations made by non experts, and overall corpus creation costs), we summarize the methodology we adopted, the achieved results, the main problems encountered, and the lessons learned.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/9991
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact