Biomedical Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize and categorize different types of entities in biomedical documents. Recently, the literature has shown effective methods based on combinations of Machine Learning algorithms and Natural Language Processing techniques. However, a critical issue of such applications is the choice of the data representation. Generic and abstract word-embeddings can be easily used to train a learning algorithm, without prior knowledge of the domain. On the other hand, dedicated hand-crafted features are expensive to define, but they could represent better the specific problem. In this work, an extensive experimental assessment is carried out, where different representations have been analyzed. Then, a general framework to learn the representation by combining general and domain-specific features is proposed and evaluated, showing empirical results on the CRAFT corpus.
Learning Representations for Biomedical Named Entity Recognition
Ivano Lauriola;Alberto Lavelli;Fabio Rinaldi
2018-01-01
Abstract
Biomedical Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize and categorize different types of entities in biomedical documents. Recently, the literature has shown effective methods based on combinations of Machine Learning algorithms and Natural Language Processing techniques. However, a critical issue of such applications is the choice of the data representation. Generic and abstract word-embeddings can be easily used to train a learning algorithm, without prior knowledge of the domain. On the other hand, dedicated hand-crafted features are expensive to define, but they could represent better the specific problem. In this work, an extensive experimental assessment is carried out, where different representations have been analyzed. Then, a general framework to learn the representation by combining general and domain-specific features is proposed and evaluated, showing empirical results on the CRAFT corpus.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.