We explore the potential of state-of-the-art Large Language Models (LLMs) to reason on the content of high-complexity documents written in Italian. We focus on both technical documents (e.g., describing civil engineering works) and regulatory documents (e.g., describing procedures). While civil engineering documents contain crucial information that supports critical decision-making in construction, transportation and infrastructure projects, procedural documents outline essential guidelines and protocols that ensure efficient operations, adherence to safety standards and effective incident management. Although LLMs offer a promising solution for automating the extraction and comprehension of high-complexity documents, potentially transforming our interaction with technical information, LLMs may encounter significant challenges when processing such documents due to their complex structure, specialized terminology and strong reliance on graphical and visual elements. Moreover, LLMs are known to sometimes produce unexpected or incorrect analyses, a phenomenon referred to as hallucination. The goal of the paper is to conduct an assessment of LLM capacities along several dimensions, including the format of the document (i.e., selectable text PDFs versus scanned OCR PDFs), the structure of the documents (e.g., number of pages, date of the document), the graphical elements (e.g., tables, graphs, photos), the interpretation of text portions (e.g., make a summary), and the need of external knowledge (e.g., to interpret a mathematical expressions). To run the assessment, we took advantage of GPT-4omni, a large multi-modal model pre-trained on a variety of different data. Our findings suggest that there is great potential for real-world applications for high-complexity documents, although LLMs may still be susceptible to produce misleading information.

Understanding High-complexity Technical and Regulatory Documents with State-of-the-Art Models: A Pilot Study

Bernardo Magnini
;
Roberto Zanoli
2024-01-01

Abstract

We explore the potential of state-of-the-art Large Language Models (LLMs) to reason on the content of high-complexity documents written in Italian. We focus on both technical documents (e.g., describing civil engineering works) and regulatory documents (e.g., describing procedures). While civil engineering documents contain crucial information that supports critical decision-making in construction, transportation and infrastructure projects, procedural documents outline essential guidelines and protocols that ensure efficient operations, adherence to safety standards and effective incident management. Although LLMs offer a promising solution for automating the extraction and comprehension of high-complexity documents, potentially transforming our interaction with technical information, LLMs may encounter significant challenges when processing such documents due to their complex structure, specialized terminology and strong reliance on graphical and visual elements. Moreover, LLMs are known to sometimes produce unexpected or incorrect analyses, a phenomenon referred to as hallucination. The goal of the paper is to conduct an assessment of LLM capacities along several dimensions, including the format of the document (i.e., selectable text PDFs versus scanned OCR PDFs), the structure of the documents (e.g., number of pages, date of the document), the graphical elements (e.g., tables, graphs, photos), the interpretation of text portions (e.g., make a summary), and the need of external knowledge (e.g., to interpret a mathematical expressions). To run the assessment, we took advantage of GPT-4omni, a large multi-modal model pre-trained on a variety of different data. Our findings suggest that there is great potential for real-world applications for high-complexity documents, although LLMs may still be susceptible to produce misleading information.
File in questo prodotto:
File Dimensione Formato  
63_main_long.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: PUBBLICO - Creative Commons 3.6
Dimensione 1.68 MB
Formato Adobe PDF
1.68 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/357427
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact