Towards Automatic Evaluation of Question/Answering Systems

Magnini, Bernardo; Negri, Matteo; Prevete, Roberto; Tanev, Hristo

This paper presents an innovative approach to the automatic evaluation of Question Answering systems. The methodology relies on the use of the Web, considered as an `oracle` containing all the information needed to check the relevance of a candidate answer with respect to a given question. The procedure is completely automatic (i.e. no human intervention is required) and it is based on the assumption that the answers` relevance can be assessed from a purely quantitative perspective. The methodology is based on a Web search using patterns derived both from the question and from the answer. Different kinds of patterns have been identified, ranging from `lenient` (i.e. boolean combinations of single words), to `strict` patterns (i.e. whole sentences or combinations of phrases). A statistically-based algorithm has been developed which considers both the kinds of patterns used in the search and the number of documents returned from the Web. Experiments carried out on the TREC-10 corpus show that the approach achieves a high level of performance (i.e. 80% success rate)