Combining Efforts for Improving Automatic Classification of Emotional User States

Batliner, Anton; Steidl, Stefan; Schuller, Bjoern; Seppi, Dino; Vogt, Thurid; Devillers, Laurence; Vidrascu, Laurence; Amir, Noam; Kessous, Loic; Aharonson, Vered

Classification performance of emotional user states found in realistic, spontaneous speech is not very high, compared to the performance reported for acted speech in the literature. This might be partly due to the difficulty of providing reliable annotations, partly due to suboptimal feature vectors used for classification, and partly due to the difficulty of the task. In this paper, we present a co-operation between several sites, using a thoroughly processed emotional database. For the four-class problem motherese/neutral/emphatic/angry, we first report classification performance computed independently at each site. Then we show that by using all the best features from each site in a combined classification, and by combining classifier outputs within the ROVER framework, classification results can be improved; all feature types and features from all sites contributed.