Machine understanding of documents has become a fundamental element in applications dealing with large quantities of text and images. The main purpose of the present report is to make light on the crowded world of document reading products. We intend to describe the main features and drawback of these systems and to highlight some criteria for their correct evaluation. Finally, we review briefly the research fields currently investigated in order to improve the performances of the current document recognition systems. This review covers only OCR systems specifically designed for reading documents containing text with little or no a priori information about the layout of the page, usually called page readers. This task is different from the one of reading pre-printed forms, for instance payment forms, where knowledge about the specific layout and format is complete. Also, dealing with forms usually means higher thorughput, which implies special purpose hardware: our review covers software products only. This review is based on the technology assessment of OCR products that takes place, annually since 1992, at the Information Science Institute (ISRI) at the University of Nevada, Las Vegas
Review of the State of the Art in Optical Character Recognition. Part 1: Machine Printed Documents
Messelodi, Stefano;Modena, Carla Maria
1996-01-01
Abstract
Machine understanding of documents has become a fundamental element in applications dealing with large quantities of text and images. The main purpose of the present report is to make light on the crowded world of document reading products. We intend to describe the main features and drawback of these systems and to highlight some criteria for their correct evaluation. Finally, we review briefly the research fields currently investigated in order to improve the performances of the current document recognition systems. This review covers only OCR systems specifically designed for reading documents containing text with little or no a priori information about the layout of the page, usually called page readers. This task is different from the one of reading pre-printed forms, for instance payment forms, where knowledge about the specific layout and format is complete. Also, dealing with forms usually means higher thorughput, which implies special purpose hardware: our review covers software products only. This review is based on the technology assessment of OCR products that takes place, annually since 1992, at the Information Science Institute (ISRI) at the University of Nevada, Las VegasI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.