Time-Frequency Reassigned Cepstral Coefficients for Phone-Level Speech Segmentation

Tryfou, Georgia; Pellin, Marco; Omologo, Maurizio

This paper studies feature extraction within the context of automatic speech segmentation at phonetic level. Current state-of-the-art solutions widely use cepstral features as a front-end for HMM based frameworks. Although the automatic segmentation results have reached the inter-annotator agreement, within a tolerance equal or higher than 20ms, the same is not true when a lower tolerance is considered. We propose a new set of cepstral features that derive from the time-frequency reassigned spectrogram and offer a sharper representation of the speech signal in the cepstral domain. The features are evaluated through a series of forced alignment experiments which demonstrate a better performance, compared to the traditional MFCC features, in aligning phone boundaries within a small distance from their true position.