Coreference is a complex phenomenon involving a variety of linguistic factors: from surface similarity to morphological agreement, specific syntactic constraints, semantics, salience and encyclopedic knowledge. It is therefore essential for any coreference resolution system to rely on a rich linguistic representation of a document to be analyzed. This chapter focuses on the preprocessing technology, taking into consideration a variety of external tools needed to create such representations, and shows how to combine them in a Preprocessing Pipeline, in order to extract mentions of entities in a given document, describing their linguistic properties.
Preprocessing Technology
Zanoli, Roberto
2016-01-01
Abstract
Coreference is a complex phenomenon involving a variety of linguistic factors: from surface similarity to morphological agreement, specific syntactic constraints, semantics, salience and encyclopedic knowledge. It is therefore essential for any coreference resolution system to rely on a rich linguistic representation of a document to be analyzed. This chapter focuses on the preprocessing technology, taking into consideration a variety of external tools needed to create such representations, and shows how to combine them in a Preprocessing Pipeline, in order to extract mentions of entities in a given document, describing their linguistic properties.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.