SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization
Abstract
SkeletonGaussian enables editable 4D generation by decomposing motion into rigid skeleton-driven and non-rigid fine-grained components using hexplane-based refinement.
4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/
Community
🚀 Introducing SkeletonGaussian — Editable 4D Generation through Gaussian Skeletonization!
(Accepted by CVM 2026)
✨ Generate dynamic 3D Gaussians from text, images, or videos
🦴 Explicit skeleton-driven motion enables intuitive pose editing
🎯 Higher visual quality + better motion fidelity than prior 4D methods
A new step toward controllable, editable 4D generation.
Project page: https://wusar.github.io/projects/skeletongaussian/
Arxiv: https://arxiv.org/abs/2602.04271
Code will be available soon.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting (2026)
- AnimaMimic: Imitating 3D Animation from Video Priors (2025)
- Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement (2025)
- CAMO: Category-Agnostic 3D Motion Transfer from Monocular 2D Videos (2026)
- Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis (2026)
- 3DProxyImg: Controllable 3D-Aware Animation Synthesis from Single Image via 2D-3D Aligned Proxy Embedding (2025)
- MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper