OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

1City University of Hong Kong

2South China University of Technology

TL;DR: We introduce a novel task and benchmark to assess the generalization ability of current 3D scene understanding models to open-set object attribute vocabularies.

Task

Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) task expands the vocabulary types of the classic 3D Scene Understanding (OV-3D) task. While OV-3D only supports queries of object classes, GOV-3D supports queries of object-related abstract attributes.

Benchmark

OpenScan benchmark provides attribute annotations for each object, expanding the single category of object classes into eight linguistic aspects of object-related attributes. It collects a total of 153,644 attribute annotations across 341 attributes for 1,513 scenes in ScanNet200.

Abstract

Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed object classes. However, existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes, which is insufficient to provide a holistic evaluation to what extent a model understands the 3D scene. In this paper, we introduce a more challenging task called Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) to explore the open vocabulary problem beyond object classes. It encompasses an open and diverse set of generalized knowledge, expressed as linguistic queries of fine-grained and object-specific attributes. To this end, we contribute a new benchmark named OpenScan, which consists of 3D object attributes across eight representative linguistic aspects, including affordance, property, material, and more. We further evaluate state-of-the-art OV-3D methods on our OpenScan benchmark, and discover that these methods struggle to comprehend the abstract vocabularies of the GOV-3D task, a challenge that cannot be addressed by simply scaling up object classes during training. We highlight the limitations of existing methodologies and explore a promising direction to overcome the identified shortcomings.

Quantitative Results on OpenScan

3D instance segmentation results on our OpenScan benchmark.

BibTeX

@article{zhao2024openscan,
  title={OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding},
  author={Zhao, Youjun and Lin, Jiaying and Ye, Shuquan and Pang, Qianshi and Lau, Rynson WH},
  journal={arXiv preprint arXiv:2408.11030},
  year={2024}
}