This article presents SICK (Sentences Involving Compositional Knowledge), a large size English benchmark created to evaluate compositional distributional semantic models. SICK consists of about 10,000 English sentence pairs that include examples of the lexical, syntactic and semantic phenomena that distributional models are expected to account for, but do not require dealing with other aspects of existing sentential datasets (e.g. idiomatic multiword expressions, named entities, telegraphic language). Each sentence pair was annotated for two crucial semantic tasks: relatedness in meaning and entailment relation between the two sentences composing the pair. SICK was used in the SemEval-2014 Shared Task, and is freely available for research purposes.
The SICK Dataset
Bentivogli, Luisa;Menini, Stefano;
2025-01-01
Abstract
This article presents SICK (Sentences Involving Compositional Knowledge), a large size English benchmark created to evaluate compositional distributional semantic models. SICK consists of about 10,000 English sentence pairs that include examples of the lexical, syntactic and semantic phenomena that distributional models are expected to account for, but do not require dealing with other aspects of existing sentential datasets (e.g. idiomatic multiword expressions, named entities, telegraphic language). Each sentence pair was annotated for two crucial semantic tasks: relatedness in meaning and entailment relation between the two sentences composing the pair. SICK was used in the SemEval-2014 Shared Task, and is freely available for research purposes.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
