Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states, limiting the accuracy of part-mesh reconstruction and part dynamics modeling, particularly for complex multi-part articulated objects. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation to address these issues. Our method incorporates canonical Gaussians with coarse-to-fine initialization and updates for aligning articulated part information across different object states, and employs a skinning-inspired part dynamics modeling module to improve both part-mesh reconstruction and articulation learning. Extensive experiments on both synthetic and real-world datasets, including a new benchmark for complex multi-part objects, demonstrate that ArtGS achieves state-of-the-art performance in joint parameter estimation and part mesh reconstruction. Our approach significantly improves reconstruction quality and efficiency, especially for multi-part articulated objects. Additionally, we provide comprehensive analyses of our design choices, validating the effectiveness of each component to highlight potential areas for future improvement.
The overview of ArtGS. Our method is divided into two stages: (i) obtaining coarse canonical Gaussians \(G^c_{\text{init}}\) by matching the Gaussians \(G^0_{\text{single}}\) and \(G^1_{\text{single}}\) trained with each single-state individually and initializing the part assignment module with clustered centers, (ii) jointly optimizing canonical Gaussians \(G^c\) and the articulation model (including the articulation parameters \(\Phi\) and the part assignment module).