IRIS Institutional Research Information System

Threat modeling refers to the software design activity that involves the proactive identification, evaluation, and mitigation of specific potential threat scenarios. Recently, attention has been growing for the potential to automate the threat elicitation process using Large Language Models (llms), and different tools have emerged that are capable of generating threats based on system models and other descriptive system documentation. This paper presents the outcomes of an experimental evaluation study of llm-based threat elicitation tools, which we apply to two complex and contemporary application cases that involve biometric authentication. The comparative benchmark is based on a grounded approach to establish four distinct baselines which are representative of the results of human threat modelers, both novices and experts. In support of scale and reproducibility, the evaluation approach itself is maximally automated using sentence transformer models to perform threat mapping. Our study evaluates 56 distinct threat models generated by 6 llm-based threat elicitation tools. While the generated threats are somewhat similar to the threats documented by human threats modelers, relative performance is low. The evaluated llm-based threat elicitation tools prove to be particularly inefficient in eliciting the threats on the expert level. Furthermore, we show that performance differences between these tools can be attributed on a similar level to both the prompting approach (e.g., multi-shot, knowledge pre-prompting, role prompting) and the actual reasoning capabilities of the underlying llms being used.

A comparative benchmark study of LLM-based threat elicitation tools

Van Landuyt, Dimitri;Mollaeefar, Majid;Raciti, Mario;Verreydt, Stef;Kalash, Abdulaziz;Bissoli, Andrea;Preuveneers, Davy;Bella, Giampaolo;Ranise, Silvio

2026-01-01

Abstract

Threat modeling refers to the software design activity that involves the proactive identification, evaluation, and mitigation of specific potential threat scenarios. Recently, attention has been growing for the potential to automate the threat elicitation process using Large Language Models (llms), and different tools have emerged that are capable of generating threats based on system models and other descriptive system documentation. This paper presents the outcomes of an experimental evaluation study of llm-based threat elicitation tools, which we apply to two complex and contemporary application cases that involve biometric authentication. The comparative benchmark is based on a grounded approach to establish four distinct baselines which are representative of the results of human threat modelers, both novices and experts. In support of scale and reproducibility, the evaluation approach itself is maximally automated using sentence transformer models to perform threat mapping. Our study evaluates 56 distinct threat models generated by 6 llm-based threat elicitation tools. While the generated threats are somewhat similar to the threats documented by human threats modelers, relative performance is low. The evaluated llm-based threat elicitation tools prove to be particularly inefficient in eliciting the threats on the expert level. Furthermore, we show that performance differences between these tools can be attributed on a similar level to both the prompting approach (e.g., multi-shot, knowledge pre-prompting, role prompting) and the actual reasoning capabilities of the underlying llms being used.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0167739X25005370-main.pdf solo utenti autorizzati Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 16.21 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	16.21 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/364408

Citazioni

ND

social impact