Toward Qualitative Evaluation of Textual Entailment Systems