Text Block Detection for Document Segmentation

Coianiz, T.; Fignoni, F.

In this work a method to identify the text regions in a document, using texture based classification is presented. More precisely, homogeneous regions of a document are classified according to the two classes “text” and “non text”. The proposed method is based on Gabor filtering according to a scheme of multichannel filtering. Thanks to a preliminary spectral analysis is possible to perform the classification using only a couple of Gabor filters, tuned on the text interline frequency. The method relys basically on the measure of the ratio between the input signal, and the signal filtered with a suitable bank of Gabor filters, and does not require any a-priori knowledge on the text to classify