Xingyu Chen

陈星宇

I am a PhD student jointly enrolled at Westlake University and Zhejiang University, in the Inception3D Lab, co-supervised by Anpei Chen and Andreas Geiger.

I am fortunate to work with Yuliang Xiu at Endless AI Lab and interned at Tencent AI Lab, collaborating with Xuan Wang and Qi Zhang. I got my M.S. from Xi'an Jiaotong University and my B.S. from Chongqing University.

I work on spatial intelligence across computer vision, machine learning, computer graphics, and robotics.

Research

The world we see is constantly changing: how do intelligent systems generalize to new observations? This question led me to quest for an understanding of the mechanisms underlying spatial intelligence and to develop methods for enabling artificial intelligence with this remarkable capability.

Specifically, I am investigating how generalizability can emerge from reusable 3D & 4D representations, how these representations of the dynamic 3D world could be learned from images & videos, and how inductive biases could serve as expert knowledge to reduce unknown parameters and make learning more efficient.

Equal Contribution *, Corresponding Author †, Project Lead ⚑

R3 confidence-weighted pairwise pose graph

R³: 3D Reconstruction via Relative Regression

Congrong Xu, Huachen Gao, Xingyu Chen, Yuliang Xiu, Jun Gao, Anpei Chen

arXiv, 2026

project page / arXiv / code

Assembles confidence-guided pairwise poses into trajectories, enabling a 0.3B model to match 1B baselines.

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

Roni Itkin, Noam Issachar, Yehonatan Keypur, Xingyu Chen, Anpei Chen, Sagie Benaim

European Conference on Computer Vision (ECCV), 2026

project page / arXiv

A global scene tokenizer that moves beyond pixel-wise redundant tokens.

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, Anpei Chen†

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

project page / arXiv / code

Synthesising 4D dynamic objects from single monocular video.

TTT3R: 3D Reconstruction as Test-Time Training

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen†

International Conference on Learning Representations (ICLR), 2026

China3DV 2026 Top 5 Paper

project page / arXiv / code

A simple state update rule to enhance length generalization for CUT3R.

Human3R: Everyone Everywhere All at Once

Yue Chen, Xingyu Chen, Yuxuan Xue, Anpei Chen, Yuliang Xiu†, Gerard Pons-Moll

International Conference on Learning Representations (ICLR), 2026

project page / arXiv / code / interactive demo

Online human-scene reconstruction in One model, One stage.

🦣Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen†

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

project page / arXiv / code / interactive demo

Disentangles DUSt3R attention maps and repurposes them for training-free 4D reconstruction.

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen, Gerard Pons-Moll, Yuliang Xiu†

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

project page / arXiv / video / code / demo / gallery

Uses novel-view synthesis to probe texture and geometry awareness in visual foundation models.

L2G-NeRF: Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields

Yue Chen*, Xingyu Chen*⚑, Xuan Wang†, Qi Zhang, Yu Guo†, Ying Shan, Fei Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

project page / arXiv / paper / code / supplementary / video / poster

Couples local and global alignment with differentiable solvers for robust bundle-adjusting NeRF.

UV Volumes for Real-time Rendering of Editable Free-view Human Performance

Yue Chen*, Xuan Wang*, Xingyu Chen, Qi Zhang, Xiaoyu Li, Yu Guo†, Jue Wang, Fei Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

project page / arXiv / paper / code / supplementary / video / poster

Separates high-frequency human appearance from 3D volume and encodes it as 2D textures for real-time rendering and retexturing.

PROCA: Place Recognition under Occlusion and Changing Appearance via Disentangled Representations

Yue Chen, Xingyu Chen†⚑, Yicen Li

IEEE International Conference on Robotics and Automation (ICRA), 2023

arXiv / paper / code / video / poster

Disentangles place, appearance, and occlusion factors, then uses the place code as a retrieval descriptor.

Sparse semantic map localization preview

Sparse Semantic Map-Based Monocular Localization in Traffic Scenes Using Learned 2D-3D Point-Line Correspondences

Xingyu Chen, Jianru Xue†, Shanmin Pang

IEEE Robotics and Automation Letters (RA-L), 2022

arXiv / paper

Estimates camera poses from sparse semantic maps through learned 2D-3D point-line correspondences.

Ha-NeRF😆: Hallucinated Neural Radiance Fields in the Wild

Xingyu Chen, Qi Zhang†, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, Jue Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

project page / arXiv / paper / supplementary / code / video / poster

Recovers NeRF from tourist photos with variable appearance and occlusions, enabling occlusion-free renderings with hallucinated appearance.

Using Detection, Tracking and Prediction in Visual SLAM to Achieve Real-time Semantic Mapping of Dynamic Scenarios

Xingyu Chen, Jianru Xue†, Jianwu Fang, Yuxin Pan, Nanning Zheng

IEEE Intelligent Vehicles Symposium (IV), 2020

arXiv / paper

Detects only keyframes and predicts dynamic objects in remaining frames for efficient semantic mapping.

Navigation Command Matching for Vision-based Autonomous Driving

Yuxin Pan, Jianru Xue†, Pengfei Zhang, Wanli Ouyang, Jianwu Fang, Xingyu Chen

IEEE International Conference on Robotics and Automation (ICRA), 2020

ResearchGate / paper

Matches navigation commands with smooth rewards to discriminate sub-optimal driving actions.

Projects

I am passionate about bridging the physical and digital worlds by building next-generation AR and robotics.

Kuafu (Autonomous Driving)

GPS-Denied Navigation
Intelligent Vehicle Challenge
Odometry Mapping Localization

Robotic Hand

Hand gesture recognition
Multi-sensor fusion
Robotic hand controller

Robotic Arm

Teleoperation
Hand gesture recognition
Four-bar linkage structure

Talks

Secrets Behind 3D Foundation Models talk

三维基础模型的秘密
Secrets Behind 3D Foundation Models

Huawei Noah's Ark Lab (Toronto), 2025

Tsinghua University, 2025

Unveiling the internal mechanisms and emergent spatial intelligence of 3D foundation models, including Easi3R, TTT3R, and Human3R.

Inferring the physical world and camera poses from images

ETH Zurich, 2023

Sharing the intuition of dealing with dynamic objects in our previous work and giving a prospect of handling the tracking problem via neural fields.

Neural Radiance Fields for Unconstrained Photo Collections talk

光影幻象：神经辐射场中的时空流转
Neural Radiance Fields for Unconstrained Photo Collections

深蓝学院 (Shenlan College online education), 2022

Introduction about Neural Radiance Fields (NeRF) for unconstrained photo collections, including NeRF, NeRF in the Wild, and Ha-NeRF.

Funding & Grants

I gratefully acknowledge support from the following programs and organizations.

2026: Compute Grant, MiraclePlus & Beijing Haidian Compute Lab.
奇绩创坛 & 北京海淀算力实验室算力资助
2025: Dean's PhD Research Grant, School of Engineering, Westlake University.
西湖大学工学院院长专项博士生项目

Teaching

Machine Learning Westlake University

Machine Learning, Teaching Assistant, Westlake University

Academic Services

Area Chair
- NeurIPS
Outstanding Reviewer
- Computer Vision: CVPR, ICCV, ECCV, 3DV, TPAMI
- Machine Learning: NeurIPS, ICLR, ICML
- Robotics: IROS
- Graphics: SIGGRAPH, TOG, TVCG

Research

Projects

Robotic Hand

Robotic Arm

Talks

三维基础模型的秘密 Secrets Behind 3D Foundation Models

Inferring the physical world and camera poses from images

Funding & Grants

Teaching

Academic Services

三维基础模型的秘密
Secrets Behind 3D Foundation Models