LLMs face significant challenges in systematic generalization, particularly when dealing with reasoning tasks requiring compositional rules and handling out-of-distribution examples. To address these challenges, we introduce a few-shot repair methodology aimed at improving the generalization capabilities of general purpose LLMs. Our approach employs an iterative example selection strategy, which incrementally constructs a tailored set of few-shot examples optimized to enhance model's performance on a given task. As a proof of concept, we apply this methodology to the resolution of algebraic expressions involving non-standard simplification rules, according to which the priority of addition and multiplication is changed. We construct synthetic datasets with diverse levels of difficulty to evaluate the performance of LLMs in simplifying non-standard mathematical expressions, designed to test compositional reasoning. We evaluate multiple prompting strategies, namely zero-shot, few-shot, and Chain-of-Thought prompts. Our findings indicate that LLMs exhibit limited proficiency in these mathematical tasks. We further demonstrate that LLMs reasoning benefits from our iterative shot selection prompting strategy integrated with explicit reasoning instructions. Interestingly, our experiments reveal that some LLMs achieve better generalization performances when prompted with simpler few-shot examples rather than complex ones following the test data distribution. This counterintuitive finding suggests that the model may benefit more from clear, easily interpretable patterns that can then be abstracted and applied to more complex, out-of-distribution tasks, compared to being provided with few-shot complex examples following the data distribution. Our results confirm the effectiveness and broad applicability of our methodology for systematically improving LLM performance in abstract reasoning tasks with in- and out-of-distribution examples.
Iterative In-Context Learning to Enhance LLMs Abstract Reasoning: The Case-Study of Algebraic Tasks
Matteo zavatteri;Alessandro Sperduti;
2026-01-01
Abstract
LLMs face significant challenges in systematic generalization, particularly when dealing with reasoning tasks requiring compositional rules and handling out-of-distribution examples. To address these challenges, we introduce a few-shot repair methodology aimed at improving the generalization capabilities of general purpose LLMs. Our approach employs an iterative example selection strategy, which incrementally constructs a tailored set of few-shot examples optimized to enhance model's performance on a given task. As a proof of concept, we apply this methodology to the resolution of algebraic expressions involving non-standard simplification rules, according to which the priority of addition and multiplication is changed. We construct synthetic datasets with diverse levels of difficulty to evaluate the performance of LLMs in simplifying non-standard mathematical expressions, designed to test compositional reasoning. We evaluate multiple prompting strategies, namely zero-shot, few-shot, and Chain-of-Thought prompts. Our findings indicate that LLMs exhibit limited proficiency in these mathematical tasks. We further demonstrate that LLMs reasoning benefits from our iterative shot selection prompting strategy integrated with explicit reasoning instructions. Interestingly, our experiments reveal that some LLMs achieve better generalization performances when prompted with simpler few-shot examples rather than complex ones following the test data distribution. This counterintuitive finding suggests that the model may benefit more from clear, easily interpretable patterns that can then be abstracted and applied to more complex, out-of-distribution tasks, compared to being provided with few-shot complex examples following the data distribution. Our results confirm the effectiveness and broad applicability of our methodology for systematically improving LLM performance in abstract reasoning tasks with in- and out-of-distribution examples.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
