Graphic accents are often used in the design of complex documents in order to emphasize particular information. Words or illustrations are surrounded by a border line or highlighted by means of a colored background. This paper presents a method for the automatic extraction of document layout items, called {\em frames}, having polygonal shape and/or a uniformly colored background. As frames break the normal text flow, frame detection is a fundamental step of the document layout analysis in a document understanding system. The presented method relies on a color region growing algorithm and on straight edges extractor. The shape analysis of the obtained regions permits to localize the frames with their attributes. In order to reduce computation time and to return only specific patterns, the method exploits information about a model of the frames to be detected such as shape, skew or size, possibly supplied by the user or depending on the specific document class. The presented algorithm is assessed on a page databases containing more than 675 framed items. The evaluation is based on a novel tree matching method that takes into account the frame hierarchy and their shape

Extraction of Polygonal Frames from Color Documents for Page Decomposition

Messelodi, Stefano;Modena, Carla Maria
2003-01-01

Abstract

Graphic accents are often used in the design of complex documents in order to emphasize particular information. Words or illustrations are surrounded by a border line or highlighted by means of a colored background. This paper presents a method for the automatic extraction of document layout items, called {\em frames}, having polygonal shape and/or a uniformly colored background. As frames break the normal text flow, frame detection is a fundamental step of the document layout analysis in a document understanding system. The presented method relies on a color region growing algorithm and on straight edges extractor. The shape analysis of the obtained regions permits to localize the frames with their attributes. In order to reduce computation time and to return only specific patterns, the method exploits information about a model of the frames to be detected such as shape, skew or size, possibly supplied by the user or depending on the specific document class. The presented algorithm is assessed on a page databases containing more than 675 framed items. The evaluation is based on a novel tree matching method that takes into account the frame hierarchy and their shape
2003
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/860
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact