Acoustic Analysis and Automatic Recognition of Spontaneous Children´s Speech

Gerosa, Matteo; Giuliani, Diego; Narayanan, S.

This paper presents analyses, and recognition experiments, on spontaneous American English speech collected from children aged from 8 to 13 years. These analyses focused on variations in phone duration and on the scattering of phones in the acoustic space and were aimed at achieving a better understanding of spectral and temporal changes occurring in spontaneous speech produced by children of various ages with a view toward developing robust automatic speech recognition applications. The speech data were partitioned in two subsets depending on the annotated presence/absence of explicit occurrences of spontaneous speech phenomena such as fillers, false starts and other disfluencies. All the analyses carried out, as well as the results of recognition experiments, show a significant difference between these two partitions. In particular, recognition performance for the subset containing annotated spontaneous speech phenomena was significantly worse (by almost 15%) than the one achieved for the other subset. Relative improvements due to acoustic model adaptation and normalization on both data partitions were comparable, underscoring that significant performance degradation happens due to spontaneous speech variability beyond those reflected in segmental spectral characteristics.