Sentiment lexicons are widely used in computational linguistics, as they represent a resource that directly contains subjective sentimental knowledge. Usually these sentiment lexica are generic and developed without any specific semantic domain in mind. Nonetheless, the domain context can be highly relevant for sentiment analysis, as it is known that word polarities can be influenced by domain-specific traits. This paper studies the problem of automatically generating domain-adapted sentiment lexicons that can be used in posterior sentiment analysis tasks. We propose a neural network approach that modifies a sentiment lexicon using distantly annotated text of a certain domain. Additionally, we present a completely data-driven domain characterization metric that measures the centrality of a set of documents. Experimental work shows that this metric offers a measure of the generated lexicons' quality. Also, it is shown that the generated lexicons yield higher performance on domain-oriented sentiment analysis than a generic lexicon and other known baselines. Finally, it is also discussed that these extracted lexicons can be used for sentiment analysis even for approaches with no learning capabilities.
Neural Domain Adaptation of Sentiment Lexicons
Marco Guerini;Carlo Strapparava;
2017-01-01
Abstract
Sentiment lexicons are widely used in computational linguistics, as they represent a resource that directly contains subjective sentimental knowledge. Usually these sentiment lexica are generic and developed without any specific semantic domain in mind. Nonetheless, the domain context can be highly relevant for sentiment analysis, as it is known that word polarities can be influenced by domain-specific traits. This paper studies the problem of automatically generating domain-adapted sentiment lexicons that can be used in posterior sentiment analysis tasks. We propose a neural network approach that modifies a sentiment lexicon using distantly annotated text of a certain domain. Additionally, we present a completely data-driven domain characterization metric that measures the centrality of a set of documents. Experimental work shows that this metric offers a measure of the generated lexicons' quality. Also, it is shown that the generated lexicons yield higher performance on domain-oriented sentiment analysis than a generic lexicon and other known baselines. Finally, it is also discussed that these extracted lexicons can be used for sentiment analysis even for approaches with no learning capabilities.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.