Accurate 6D pose estimation of complex objects in 3D environments is crucial for effective robotic manipulation. However, existing benchmarks fall short in evaluating 6D pose estimation under realistic industrial conditions: most datasets focus on household objects in domestic settings, while the few available industrial datasets are limited to artificial scenarios with objects placed on tables. To bridge this gap, we introduce CHIP, the first dataset designed for 6D pose estimation of chairs manipulated by a robotic arm in a real industrial environment. CHIP comprises seven distinct chairs recorded with three different RGBD sensing technologies and presents unique challenges, including distractor objects with fine-grained similarities and severe occlusions caused by the robotic arm and human operators. CHIP contains 77,811 RGBD images annotated with ground-truth 6D poses automatically derived from the robot's kinematics, averaging 11,115 annotations per chair. We benchmark CHIP using three zero-shot 6D pose estimation methods, evaluating their performance across different sensor types, localisation priors, and occlusion levels. Results reveal substantial room for improvement, highlighting the unique challenges posed by the dataset. Project page: https://tev-fbk.github.io/CHIP.
CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
Mattia Nardon;Andrea Caraffa;Fabio Poiesi;Paul Ian Chippendale;Davide Boscaini
2025-01-01
Abstract
Accurate 6D pose estimation of complex objects in 3D environments is crucial for effective robotic manipulation. However, existing benchmarks fall short in evaluating 6D pose estimation under realistic industrial conditions: most datasets focus on household objects in domestic settings, while the few available industrial datasets are limited to artificial scenarios with objects placed on tables. To bridge this gap, we introduce CHIP, the first dataset designed for 6D pose estimation of chairs manipulated by a robotic arm in a real industrial environment. CHIP comprises seven distinct chairs recorded with three different RGBD sensing technologies and presents unique challenges, including distractor objects with fine-grained similarities and severe occlusions caused by the robotic arm and human operators. CHIP contains 77,811 RGBD images annotated with ground-truth 6D poses automatically derived from the robot's kinematics, averaging 11,115 annotations per chair. We benchmark CHIP using three zero-shot 6D pose estimation methods, evaluating their performance across different sensor types, localisation priors, and occlusion levels. Results reveal substantial room for improvement, highlighting the unique challenges posed by the dataset. Project page: https://tev-fbk.github.io/CHIP.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
