Multilingual Web sites are expected to provide the same content translated into different languages, presented according to a common style, with the same interaction facilities. To this extent, several existing sites replicate the pages in the original language, by adding translations in all supported languages. This practice exposes the site to several problems during its evolution. Updates may be not properly propagated to all replications, and inconsistencies can be introduced over time in content, presentation and interaction. In this paper, a set of algorithms is proposed to address the difficulties in restructuring an existing Web site so as to make its multilingual parts consistent with each other. First of all, pages are classified according to the language of their content. Then, correspondences among pages in the original language and their translations are determined. Based upon the computation of the edit operations necessary to make each page consistent with its translations, the site is updated to a new version where all pages are aligned. Finally, a unified XML representation of the structure and of the multilingual content of each page is produced. This ensures a consistent future evolution of the site
Automatic Support for the Alignment of Multilingual Web Sites
Tonella, Paolo;Ricca, Filippo;Pianta, Emanuele;Girardi, Christian
2003-01-01
Abstract
Multilingual Web sites are expected to provide the same content translated into different languages, presented according to a common style, with the same interaction facilities. To this extent, several existing sites replicate the pages in the original language, by adding translations in all supported languages. This practice exposes the site to several problems during its evolution. Updates may be not properly propagated to all replications, and inconsistencies can be introduced over time in content, presentation and interaction. In this paper, a set of algorithms is proposed to address the difficulties in restructuring an existing Web site so as to make its multilingual parts consistent with each other. First of all, pages are classified according to the language of their content. Then, correspondences among pages in the original language and their translations are determined. Based upon the computation of the edit operations necessary to make each page consistent with its translations, the site is updated to a new version where all pages are aligned. Finally, a unified XML representation of the structure and of the multilingual content of each page is produced. This ensures a consistent future evolution of the siteI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.