IRIS Institutional Research Information System

Over the past decades, the demand for learning English as a second language (L2) has grown consistently, as it has gradually become the lingua franca of business, culture, entertainment, and academia. This aspect has contributed to an increasing demand for systems for automatic feedback for applications in Computer-Assisted Language Learning. In this regard, mastering grammar is a key element of L2 speaking proficiency. In this paper, we illustrate an approach to spoken grammatical error correction (GEC) in a cascaded fashion using only publicly available training data. Specifically, we start from learners' utterances, investigate disfluency detection, and finally explore GEC. We test this pipeline on NICT-JLE, a publicly available L2 corpus, and TLT-GEC, a private dataset that is under preparation for release. We obtain promising results which outperform previous studies that used large proprietary datasets, and we set a potential baseline for future experiments on spoken GEC.

Grammatical Error Correction for L2 Speech Using Publicly Available Data

Stefano Bannò;Michela Rais;Marco Matassoni

2023-01-01

Abstract

Over the past decades, the demand for learning English as a second language (L2) has grown consistently, as it has gradually become the lingua franca of business, culture, entertainment, and academia. This aspect has contributed to an increasing demand for systems for automatic feedback for applications in Computer-Assisted Language Learning. In this regard, mastering grammar is a key element of L2 speaking proficiency. In this paper, we illustrate an approach to spoken grammatical error correction (GEC) in a cascaded fashion using only publicly available training data. Specifically, we start from learners' utterances, investigate disfluency detection, and finally explore GEC. We test this pipeline on NICT-JLE, a publicly available L2 corpus, and TLT-GEC, a private dataset that is under preparation for release. We obtain promising results which outperform previous studies that used large proprietary datasets, and we set a potential baseline for future experiments on spoken GEC.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2023

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/341688

Citazioni

ND

social impact