Combining Tree Structures, Flat Features and Patterns for Biomedical Relation Extraction

Chowdhury, Faisal Mahbub; Lavelli, Alberto

Kernel based methods dominate the current trend for various relation extraction tasks including protein-protein interaction (PPI) extraction. PPI information is critical in understanding biological processes. Despite considerable efforts, previously reported PPI extraction results show that none of the approaches already known in the literature is consistently better than other approaches when evaluated on different benchmark PPI corpora. In this paper, we propose a novel hybrid kernel that combines (automatically collected) dependency patterns, trigger words, negative cues, walk features and regular expression patterns along with tree kernel and shallow linguistic kernel. The proposed kernel outperforms the exiting state-of-the-art approaches on the BioInfer corpus, the largest PPI benchmark corpus available. On the other four smaller benchmark corpora, it performs either better or almost as good as the existing approaches. Moreover, empirical results show that the proposed hybrid kernel attains considerably higher precision than the existing approaches, which indicates its capability of learning more accurate models. This also demonstrates that the different types of information that we use are able to complement each other for relation extraction.