Machine understanding of documents has become a fundamental element in applications dealing with large quantities of text and images. The main purpose of the present report is to make light on the crowded world of document reading products. We intend to describe the main features and drawback of these systems and to highlight some criteria for their correct evaluation. Finally, we review briefly the research fields currently investigated in order to improve the performances of the current document recognition systems. This review covers only OCR systems specifically designed for reading documents containing text with little or no a priori information about the layout of the page, usually called page readers. This task is different from the one of reading pre-printed forms, for instance payment forms, where knowledge about the specific layout and format is complete. Also, dealing with forms usually means higher thorughput, which implies special purpose hardware: our review covers software products only. This review is based on the technology assessment of OCR products that takes place, annually since 1992, at the Information Science Institute (ISRI) at the University of Nevada, Las Vegas

Review of the State of the Art in Optical Character Recognition. Part 1: Machine Printed Documents

Messelodi, Stefano;Modena, Carla Maria
1996

Abstract

Machine understanding of documents has become a fundamental element in applications dealing with large quantities of text and images. The main purpose of the present report is to make light on the crowded world of document reading products. We intend to describe the main features and drawback of these systems and to highlight some criteria for their correct evaluation. Finally, we review briefly the research fields currently investigated in order to improve the performances of the current document recognition systems. This review covers only OCR systems specifically designed for reading documents containing text with little or no a priori information about the layout of the page, usually called page readers. This task is different from the one of reading pre-printed forms, for instance payment forms, where knowledge about the specific layout and format is complete. Also, dealing with forms usually means higher thorughput, which implies special purpose hardware: our review covers software products only. This review is based on the technology assessment of OCR products that takes place, annually since 1992, at the Information Science Institute (ISRI) at the University of Nevada, Las Vegas
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/1253
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact