IRIS Institutional Research Information System

Understanding whether large language models (LLMs) capture human-like semantic associations remains an open challenge. This study investigates semantic priming within GPT-4o Mini by analyzing probabilistic responses to psycholinguistically validated prime-target pairs. Prime-target stimuli were extracted from the Semantic Priming Project database, embedding target words within masked sentence contexts preceded by semantically related or unrelated primes. Model responses were quantified using log-probabilities associated with predicted tokens, allowing comparative evaluation of semantic priming effects. Results reveal that the model’s predictive outputs reflect priming effects when analysis is restricted to fully reconstructed data, yet these effects diminish significantly under data imputation strategies addressing extensive missingness. This discrepancy highlights critical issues regarding data preprocessing, tokenization, and the management of missing values in computational semantic experiments. Implications for future research in cognitive modeling and the refinement of LLM architectures to better approximate human semantic processing are discussed.

Semantic Priming in GPT: Investigating LLMs Through a Cognitive Psychology Lens

Colombi, Filippo;Strapparava, Carlo

2025-01-01

Abstract

Understanding whether large language models (LLMs) capture human-like semantic associations remains an open challenge. This study investigates semantic priming within GPT-4o Mini by analyzing probabilistic responses to psycholinguistically validated prime-target pairs. Prime-target stimuli were extracted from the Semantic Priming Project database, embedding target words within masked sentence contexts preceded by semantically related or unrelated primes. Model responses were quantified using log-probabilities associated with predicted tokens, allowing comparative evaluation of semantic priming effects. Results reveal that the model’s predictive outputs reflect priming effects when analysis is restricted to fully reconstructed data, yet these effects diminish significantly under data imputation strategies addressing extensive missingness. This discrepancy highlights critical issues regarding data preprocessing, tokenization, and the management of missing values in computational semantic experiments. Implications for future research in cognitive modeling and the refinement of LLM architectures to better approximate human semantic processing are discussed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				979-12-243-0587-3
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.clicit-1.29.pdf solo utenti autorizzati Licenza: Creative commons Dimensione 1.04 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.04 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/366567

Citazioni

ND

social impact