In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, extracts, analyzes and describes the visual content of structured digital documents, such as catalogs, in order to discover repeating and distinctive substructures in the document layout and to establish relations between textual and image content. Establishing meaningful links from the catalog structure between images and text paragraphs allows us to exploit the semantic annotation of the textual part to annotate the images and integrate multimedia processing and Semantic Web technologies. The paper presents the system along with experimental results and the web based service which utilizes the analysis results.
Document Layout Substructure Discovery
Andreatta, Claudio
2007-01-01
Abstract
In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, extracts, analyzes and describes the visual content of structured digital documents, such as catalogs, in order to discover repeating and distinctive substructures in the document layout and to establish relations between textual and image content. Establishing meaningful links from the catalog structure between images and text paragraphs allows us to exploit the semantic annotation of the textual part to annotate the images and integrate multimedia processing and Semantic Web technologies. The paper presents the system along with experimental results and the web based service which utilizes the analysis results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.