Graphic accents are often used in the design of complex documents in order to emphasize particular information. Words or illustrations are surrounded by a border line or highlighted by means of a colored background. This paper presents a method for the automatic extraction of document layout items, called {\em frames}, having polygonal shape and/or a uniformly colored background. As frames break the normal text flow, frame detection is a fundamental step of the document layout analysis in a document understanding system. The presented method relies on a color region growing algorithm and on straight edges extractor. The shape analysis of the obtained regions permits to localize the frames with their attributes. In order to reduce computation time and to return only specific patterns, the method exploits information about a model of the frames to be detected such as shape, skew or size, possibly supplied by the user or depending on the specific document class. The presented algorithm is assessed on a page databases containing more than 675 framed items. The evaluation is based on a novel tree matching method that takes into account the frame hierarchy and their shape
Extraction of Polygonal Frames from Color Documents for Page Decomposition
Messelodi, Stefano;Modena, Carla Maria
2003-01-01
Abstract
Graphic accents are often used in the design of complex documents in order to emphasize particular information. Words or illustrations are surrounded by a border line or highlighted by means of a colored background. This paper presents a method for the automatic extraction of document layout items, called {\em frames}, having polygonal shape and/or a uniformly colored background. As frames break the normal text flow, frame detection is a fundamental step of the document layout analysis in a document understanding system. The presented method relies on a color region growing algorithm and on straight edges extractor. The shape analysis of the obtained regions permits to localize the frames with their attributes. In order to reduce computation time and to return only specific patterns, the method exploits information about a model of the frames to be detected such as shape, skew or size, possibly supplied by the user or depending on the specific document class. The presented algorithm is assessed on a page databases containing more than 675 framed items. The evaluation is based on a novel tree matching method that takes into account the frame hierarchy and their shapeI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.