IRIS Institutional Research Information System

The growing diversity and dynamics of urban environments demand 3D semantic segmentation methods that can recognize a wide range of objects without relying on predefined classes or time-consuming labelled training data. As urban scenes evolve and application requirements vary across locations, flexible, annotation-free 3D segmentation methods are becoming increasingly desirable for large-scale 3D analytics. This work presents the first training-free, open-vocabulary (OV) method for 3D aerial point cloud classification and benchmarks it against state-of-the-art supervised 3D neural networks for the semantic enrichment of these geospatial data. The proposed approach leverages open-vocabulary object recognition in multiple 2D imagery and subsequently projects and refines these detections in 3D space, enabling semantic labelling without prior class definitions or annotated data. In contrast, the supervised baselines are trained on labelled datasets and restricted to a fixed set of object categories. We evaluate all methods with quantitative metrics and qualitative analysis, highlighting their respective strengths, limitations and suitability for scalable urban 3D mapping. By removing the dependency on annotated data and fixed taxonomies, this work represents a key step toward adaptive, scalable and semantic understanding of 3D urban environments.

Open-Vocabulary Segmentation of Aerial Point Clouds

Alami, Ashkan;Remondino, Fabio

2026-01-01

Abstract

The growing diversity and dynamics of urban environments demand 3D semantic segmentation methods that can recognize a wide range of objects without relying on predefined classes or time-consuming labelled training data. As urban scenes evolve and application requirements vary across locations, flexible, annotation-free 3D segmentation methods are becoming increasingly desirable for large-scale 3D analytics. This work presents the first training-free, open-vocabulary (OV) method for 3D aerial point cloud classification and benchmarks it against state-of-the-art supervised 3D neural networks for the semantic enrichment of these geospatial data. The proposed approach leverages open-vocabulary object recognition in multiple 2D imagery and subsequently projects and refines these detections in 3D space, enabling semantic labelling without prior class definitions or annotated data. In contrast, the supervised baselines are trained on labelled datasets and restricted to a fixed set of object categories. We evaluate all methods with quantitative metrics and qualitative analysis, highlighting their respective strengths, limitations and suitability for scalable urban 3D mapping. By removing the dependency on annotated data and fixed taxonomies, this work represents a key step toward adaptive, scalable and semantic understanding of 3D urban environments.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/366827

Citazioni

ND

social impact