Understanding whether large language models (LLMs) capture human-like semantic associations remains an open challenge. This study investigates semantic priming within GPT-4o Mini by analyzing probabilistic responses to psycholinguistically validated prime-target pairs. Prime-target stimuli were extracted from the Semantic Priming Project database, embedding target words within masked sentence contexts preceded by semantically related or unrelated primes. Model responses were quantified using log-probabilities associated with predicted tokens, allowing comparative evaluation of semantic priming effects. Results reveal that the model’s predictive outputs reflect priming effects when analysis is restricted to fully reconstructed data, yet these effects diminish significantly under data imputation strategies addressing extensive missingness. This discrepancy highlights critical issues regarding data preprocessing, tokenization, and the management of missing values in computational semantic experiments. Implications for future research in cognitive modeling and the refinement of LLM architectures to better approximate human semantic processing are discussed.

Semantic Priming in GPT: Investigating LLMs Through a Cognitive Psychology Lens

Colombi, Filippo;Strapparava, Carlo
2025-01-01

Abstract

Understanding whether large language models (LLMs) capture human-like semantic associations remains an open challenge. This study investigates semantic priming within GPT-4o Mini by analyzing probabilistic responses to psycholinguistically validated prime-target pairs. Prime-target stimuli were extracted from the Semantic Priming Project database, embedding target words within masked sentence contexts preceded by semantically related or unrelated primes. Model responses were quantified using log-probabilities associated with predicted tokens, allowing comparative evaluation of semantic priming effects. Results reveal that the model’s predictive outputs reflect priming effects when analysis is restricted to fully reconstructed data, yet these effects diminish significantly under data imputation strategies addressing extensive missingness. This discrepancy highlights critical issues regarding data preprocessing, tokenization, and the management of missing values in computational semantic experiments. Implications for future research in cognitive modeling and the refinement of LLM architectures to better approximate human semantic processing are discussed.
2025
979-12-243-0587-3
File in questo prodotto:
File Dimensione Formato  
2025.clicit-1.29.pdf

solo utenti autorizzati

Licenza: Creative commons
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/366567
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact