We present an approach for extracting relations between named entities from natural language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging, and lemmatization. It uses a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We present the results of experiments on extracting five different types of relations from a dataset of newswire documents and show that each information source provides a useful contribution to the recognition task. Usually the combined kernel significantly increases the precision with respect to the basic kernels, sometimes at the cost of a slightly lower recall. Moreover, we performed a set of experiments to assess the influence of the accuracy of named-entity recognition on the performance of the relation-extraction algorithm. Such experiments were performed using both the correct named entities (i.e., those manually annotated in the corpus) and the noisy named entities (i.e., those produced by a machine learning-based named-entity recognizer). The results show that our approach significantly improves the previous results obtained on the same dataset.
Relation Extraction and the Influence of Automatic Named-Entity Recognition
Giuliano, Claudio;Lavelli, Alberto;Romano, Lorenza
2007-01-01
Abstract
We present an approach for extracting relations between named entities from natural language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging, and lemmatization. It uses a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We present the results of experiments on extracting five different types of relations from a dataset of newswire documents and show that each information source provides a useful contribution to the recognition task. Usually the combined kernel significantly increases the precision with respect to the basic kernels, sometimes at the cost of a slightly lower recall. Moreover, we performed a set of experiments to assess the influence of the accuracy of named-entity recognition on the performance of the relation-extraction algorithm. Such experiments were performed using both the correct named entities (i.e., those manually annotated in the corpus) and the noisy named entities (i.e., those produced by a machine learning-based named-entity recognizer). The results show that our approach significantly improves the previous results obtained on the same dataset.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.