Program understanding involves mapping domain concepts to the code elements that implement them. Such mapping is often implicit and undocumented. However, identifier names contain relevant clues to rediscover the mapping and make it available to programmers. In this paper, we present two approaches that exploit structural and linguistic aspects of the source code to extract ontologies. The extracted ontologies are then compared in terms of the concepts they contain and the support they give to program understanding, specifically concept location. Such ontologies are composed of domain and implementation concepts as they come from the source code. To filter domain concepts, we have applied Information Retrieval (IR) based filtering techniques. We have assessed the resulting ontologies against a reference, manually defined, domain ontology. The experimentation was carried out using six real world open source programs. Results show that the ontologies extracted using the structural and linguistic aspects of the source code are complementary. We also observed that their union gives a better support to concept location than the individual ontologies. Filtering the ontologies gives a concise representation of the domain knowledge captured in the source code. The filtered ontologies, however, have been found to be less effective in supporting concept location than the unfiltered ontologies.

Extraction of domain concepts from the source code

Tonella, Paolo
2015-01-01

Abstract

Program understanding involves mapping domain concepts to the code elements that implement them. Such mapping is often implicit and undocumented. However, identifier names contain relevant clues to rediscover the mapping and make it available to programmers. In this paper, we present two approaches that exploit structural and linguistic aspects of the source code to extract ontologies. The extracted ontologies are then compared in terms of the concepts they contain and the support they give to program understanding, specifically concept location. Such ontologies are composed of domain and implementation concepts as they come from the source code. To filter domain concepts, we have applied Information Retrieval (IR) based filtering techniques. We have assessed the resulting ontologies against a reference, manually defined, domain ontology. The experimentation was carried out using six real world open source programs. Results show that the ontologies extracted using the structural and linguistic aspects of the source code are complementary. We also observed that their union gives a better support to concept location than the individual ontologies. Filtering the ontologies gives a concise representation of the domain knowledge captured in the source code. The filtered ontologies, however, have been found to be less effective in supporting concept location than the unfiltered ontologies.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/265620
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact