The growing demand for learning English as a second language has increased interest in automatic approaches for assessing and improving spoken language proficiency. A significant challenge in this field is to provide interpretable scores and informative feedback to learners through individual viewpoints of learners' proficiency, as opposed to holistic scores. Thus far, holistic scoring remains commonly applied in large-scale commercial tests. As a result, an issue with more detailed evaluation is that human graders are generally trained to provide holistic scores. This paper investigates whether view-specific systems can be trained when only holistic scores are available. To enable this process, view-specific networks are defined where both their inputs and structure are adapted to focus on specific facets of proficiency. It is shown that it is possible to train such systems on holistic scores, such that they provide view-specific scores at evaluation time. View-specific networks are designed in this way for pronunciation, rhythm, text, use of parts of speech and grammatical accuracy. The relationships between the predictions of each system are investigated on the spoken part of the Linguaskill proficiency test. It is shown that the view-specific predictions are complementary in nature and capture different information about proficiency.
View-Specific Assessment of L2 Spoken English
Stefano Bannò;
2022-01-01
Abstract
The growing demand for learning English as a second language has increased interest in automatic approaches for assessing and improving spoken language proficiency. A significant challenge in this field is to provide interpretable scores and informative feedback to learners through individual viewpoints of learners' proficiency, as opposed to holistic scores. Thus far, holistic scoring remains commonly applied in large-scale commercial tests. As a result, an issue with more detailed evaluation is that human graders are generally trained to provide holistic scores. This paper investigates whether view-specific systems can be trained when only holistic scores are available. To enable this process, view-specific networks are defined where both their inputs and structure are adapted to focus on specific facets of proficiency. It is shown that it is possible to train such systems on holistic scores, such that they provide view-specific scores at evaluation time. View-specific networks are designed in this way for pronunciation, rhythm, text, use of parts of speech and grammatical accuracy. The relationships between the predictions of each system are investigated on the spoken part of the Linguaskill proficiency test. It is shown that the view-specific predictions are complementary in nature and capture different information about proficiency.File | Dimensione | Formato | |
---|---|---|---|
banno22_interspeech.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
Copyright dell'editore
Dimensione
361.34 kB
Formato
Adobe PDF
|
361.34 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.