Effective complaint detection (CD) is essential for businesses to address customer dissatisfaction and safeguard their reputation. However, existing methods face significant limitations, including insufficient context in vague or incomplete reviews and misaligned information across modalities, such as product images and textual descriptions. These challenges hinder current models from accurately identifying complaints and understanding the underlying issues. To address these limitations, we present complaint analysis through multimodal review enhancement (CAMRE), a novel framework that integrates visual and textual modalities to improve CD and generate aspect-based rationales. Central to CAMRE is the creation of visually-grounded enhanced reviews (VERs), which unify textual reviews with image-derived details to provide a comprehensive representation of customer feedback. These enriched reviews enable the fine-tuning of state-of-the-art large language models, resulting in improved complaint classification and the generation of aspect-based rationales. Furthermore, CAMRE’s aspect-based rationale generation provides actionable insights into the specific product issues underlying complaints, facilitating precise and effective interventions. Extensive experiments on the CESAMARD benchmark dataset highlight CAMRE’s superiority over existing methods, addressing the context insufficiency and misalignment issues while achieving higher accuracy in CD and rationale coherence. By emphasizing the power of multimodal data fusion, CAMRE establishes a new benchmark for complaint analysis, empowering businesses to resolve customer dissatisfaction more effectively. Code and technical appendices are available at https://github.com/Soumitra816/CAMRE.
Insight in sight: Complaint detection and aspect-based reasoning through visually-grounded reviews with VLLMs
Apoorva Singh;Soumitra Ghosh;Bruno Lepri
2025-01-01
Abstract
Effective complaint detection (CD) is essential for businesses to address customer dissatisfaction and safeguard their reputation. However, existing methods face significant limitations, including insufficient context in vague or incomplete reviews and misaligned information across modalities, such as product images and textual descriptions. These challenges hinder current models from accurately identifying complaints and understanding the underlying issues. To address these limitations, we present complaint analysis through multimodal review enhancement (CAMRE), a novel framework that integrates visual and textual modalities to improve CD and generate aspect-based rationales. Central to CAMRE is the creation of visually-grounded enhanced reviews (VERs), which unify textual reviews with image-derived details to provide a comprehensive representation of customer feedback. These enriched reviews enable the fine-tuning of state-of-the-art large language models, resulting in improved complaint classification and the generation of aspect-based rationales. Furthermore, CAMRE’s aspect-based rationale generation provides actionable insights into the specific product issues underlying complaints, facilitating precise and effective interventions. Extensive experiments on the CESAMARD benchmark dataset highlight CAMRE’s superiority over existing methods, addressing the context insufficiency and misalignment issues while achieving higher accuracy in CD and rationale coherence. By emphasizing the power of multimodal data fusion, CAMRE establishes a new benchmark for complaint analysis, empowering businesses to resolve customer dissatisfaction more effectively. Code and technical appendices are available at https://github.com/Soumitra816/CAMRE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
