In this note we analyze the indexing function of commercial electronic document management systems. In particular, we have focused our attention to the relation between indexing and the use of an OCR module in archiving products. On the basis of this relation, we have distinguished products into three classes: (1) indexing by keywords wighout using OCR; (2) field-based indexing where OCR is aplied only to user defined fields; (3) full-text indexing, where OCR is applied to the whole document or to its textual regions. As a conclusive remark, we highlight that no one of the analyzed product provides the capability of automatic segmentation and logical labeling of the document: a necessary feature for development of intelligent document indexing and retrieval systems
The Indexing Function in Document Management Systems
Messelodi, Stefano;Modena, Carla Maria
1997-01-01
Abstract
In this note we analyze the indexing function of commercial electronic document management systems. In particular, we have focused our attention to the relation between indexing and the use of an OCR module in archiving products. On the basis of this relation, we have distinguished products into three classes: (1) indexing by keywords wighout using OCR; (2) field-based indexing where OCR is aplied only to user defined fields; (3) full-text indexing, where OCR is applied to the whole document or to its textual regions. As a conclusive remark, we highlight that no one of the analyzed product provides the capability of automatic segmentation and logical labeling of the document: a necessary feature for development of intelligent document indexing and retrieval systemsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.