Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning

1College of Computing and Data Science, Nanyang Technological University, Singapore
2College of Computer Science and Technology, Zhejiang University of Technology, China
3UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China
*Corresponding author.

Preprint

The proposed 3D Activity Reasoning and Planning enables reasoning activities underneath user’s implicit instructions. It can generate detailed executable steps for the reasoned activities within 3D scenes, as well as consistent inter-step route planning with object shapes and object locations from fine-grained 3D segmentation.

Abstract

3D activity reasoning and planning has attracted increasing attention in human-robot interaction and embodied AI thanks to the recent advance in multimodal learning. However, most existing works share two constraints: 1) heavy reliance on explicit instructions with little reasoning on implicit user intention; 2) negligence of inter-step route planning on robot moves. To bridge the gaps, we propose 3D activity reasoning and planning, a novel 3D task that reasons the intended activities from implicit instructions and decomposes them into steps with inter-step routes and planning under the guidance of fine-grained 3D object shapes and locations from scene segmentation. We tackle the new 3D task from two perspectives. First, we construct ReasonPlan3D, a large-scale benchmark that covers diverse 3D scenes with rich implicit instructions and detailed annotations for multi-step task planning, inter-step route planning, and fine-grained segmentation. Second, we design a novel framework that introduces progressive plan generation with contextual consistency across multiple steps, as well as a scene graph that is updated dynamically for capturing critical objects and their spatial relations. Extensive experiments demonstrate the effectiveness of our benchmark and framework in reasoning activities from implicit human instructions, producing accurate stepwise task plans, and seamlessly integrating route planning for multi-step moves. The dataset and code will be released.

ReasonPlan3D Benchmark

Benchmark statistics. The pie chart in (a) shows the most frequently occurring verbs in inter-step route planning, along with their associated adverbs representing movements. The bar charts in (b) and (c) present actions and their associated objects in the step-by-step plans, and that in (d) show the distribution of answers across different step counts.

Method Overview: Overall Framework of SHARP

Overall architecture of the proposed SHARP method. Given the point clouds of a 3D scene, the Point Cloud Encoder generates scene-level embeddings, while the 3D segmentor predicts 3D object masks. Besides, the 2D Encoder extracts multi-view image features, which are combined with 3D object masks as input to the Scene Graph Generator to obtain scene graph. The generated scene graph is then fed into the Graph Encoder, together with the scene-level embedding as the inputs of the MLLM. For step s, the one-step plan is generated by referring to previous steps, and the scene graph is updated by graph modulation weights that emphasize objects and their spatial relations that are critical to the reasoned activity. The snow icon indicates frozen modules, while the fire icon indicates trainable modules.

Visualization

Activity reasoning and planning visualization over the ReasonPlan3D val set. Each example shows an implicit human instruction, the input point clouds of the 3D scene, the 3D segmentation of the scene, and the predictions from 3DGraphLLM and the proposed SHARP. Best viewed in color and zoom-in.

BibTeX


          @article{jiang2025exploring,
            title={Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning},
            author={Jiang, Xueying and Li, Wenhao and Zhang, Xiaoqin and Shao, Ling and Lu, Shijian},
            journal={arXiv preprint arXiv:2503.12974},
            year={2025}
          }