We investigate whether incorporating linguistically derived speaker profiles improves the response selection capabilities of instruction-tuned large language models (LLMs) in multi-party dialogues. Using the Wikipedia Talk Page dataset, we construct lightweight profiles for each speaker based on features extracted from their prior messages, including frequent nouns and verbs, and sentiment tendency. These profiles are incorporated into the input prompts and evaluated using in-context learning with LLaMA 3.2 Instruct (1B and 8B) and GPT-4o, without any model fine-tuning. We compare performance across models and prompt settings, with and without speaker profiles, and analyze the effect of different profile configurations. Results are compared against a Random baseline and a supervised Siamese RNNs (with GRU units) trained on the same data. Our results show that incorporating speaker profiles improves response selection performance across most LLM settings, with the strongest gains observed in larger models such as LLaMA 3.2 (8B). Lexical features (frequent nouns and verbs) demonstrate greater improvements than sentiment information, particularly in low-context or underspecified scenarios. However, profile effectiveness varies by model scale and prompt format, and provides limited benefit in cases where distractors are lexically and semantically similar to the ground-truth response.
Evaluating Linguistic Speaker Profiles on Response Selection in Multi-Party Dialogue
Maryam Sajedinia
;Valerio Basile
2025-01-01
Abstract
We investigate whether incorporating linguistically derived speaker profiles improves the response selection capabilities of instruction-tuned large language models (LLMs) in multi-party dialogues. Using the Wikipedia Talk Page dataset, we construct lightweight profiles for each speaker based on features extracted from their prior messages, including frequent nouns and verbs, and sentiment tendency. These profiles are incorporated into the input prompts and evaluated using in-context learning with LLaMA 3.2 Instruct (1B and 8B) and GPT-4o, without any model fine-tuning. We compare performance across models and prompt settings, with and without speaker profiles, and analyze the effect of different profile configurations. Results are compared against a Random baseline and a supervised Siamese RNNs (with GRU units) trained on the same data. Our results show that incorporating speaker profiles improves response selection performance across most LLM settings, with the strongest gains observed in larger models such as LLaMA 3.2 (8B). Lexical features (frequent nouns and verbs) demonstrate greater improvements than sentiment information, particularly in low-context or underspecified scenarios. However, profile effectiveness varies by model scale and prompt format, and provides limited benefit in cases where distractors are lexically and semantically similar to the ground-truth response.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
