Automated complex word identification (CWI) is a crucial task in several applications, from readability assessment to lexical simplification. So far, several works have modeled CWI with the goal of targeting the needs of non-native speakers. However, studies in language acquisition show that different native languages can create positive or negative interferences w.r.t. reading comprehension, favouring or hindering the understanding of a document in a foreign language. Therefore, we propose to modify CWI to address the specific difficulties connected to different native languages. In particular, we present a pipeline that, based on the user native language, identifies complex terms by automatically detecting cognates and false friends on the fly. The selection presented by the CWI module is adaptive in that it changes depending on the native language of the user. We implement and evaluate our approach for four different native languages (French, English, German and Spanish), in a setting where documents are written in Italian and should be read by language learners with low proficiency. We show that a personalised strategy based on false friend detection identifies complex terms that are different from those usually selected with standard approaches based on word frequency.
Adaptive Complex Word Identification through False Friend Detection
Palmero Aprosio, Alessio;Menini, Stefano;Tonelli, Sara
2020-01-01
Abstract
Automated complex word identification (CWI) is a crucial task in several applications, from readability assessment to lexical simplification. So far, several works have modeled CWI with the goal of targeting the needs of non-native speakers. However, studies in language acquisition show that different native languages can create positive or negative interferences w.r.t. reading comprehension, favouring or hindering the understanding of a document in a foreign language. Therefore, we propose to modify CWI to address the specific difficulties connected to different native languages. In particular, we present a pipeline that, based on the user native language, identifies complex terms by automatically detecting cognates and false friends on the fly. The selection presented by the CWI module is adaptive in that it changes depending on the native language of the user. We implement and evaluate our approach for four different native languages (French, English, German and Spanish), in a setting where documents are written in Italian and should be read by language learners with low proficiency. We show that a personalised strategy based on false friend detection identifies complex terms that are different from those usually selected with standard approaches based on word frequency.File | Dimensione | Formato | |
---|---|---|---|
3340631.3394857.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
2.24 MB
Formato
Adobe PDF
|
2.24 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.