A Bayesian Classifier for Word-level Literacy Assessment

Tepperman, Joseph; Black, Mattew; Price, Patty; Lee, Sungbok; Kazemzadeh, Abe; Gerosa, Matteo; Heritage, Margaret; Alwan, Abeer; Narayanan, Shrikanth

To automatically assess young children`s reading skills as demonstrated by isolated words read aloud, we propose a novel structure for a Bayesian Network classifier. Our network models the generative story among speech recognition-based features, treating pronunciation variants and reading mistakes as distinct but not independent cues to a qualitative perception of reading ability. This Bayesian approach allows us to estimate the probabilistic dependencies among many highly-correlated features, and to calculate soft decision scores based on the posterior probabilities for each class. With all proposed features, the best version of our network outperforms the C4.5 decision tree classifier by 17% and a Naive Bayes classifier by 8%, in terms of correlation with speaker-level reading scores on the Tball data set. This best correlation of 0.92 approaches the expert inter-evaluator correlation, 0.95.