DecomPose: Disentangling Cross-Category Optimization Contention for Category-Level 6D Object Pose Estimation

1Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, Hubei, China
2University of Science and Technology of China, Hefei, Anhui, China
3Peking University, Beijing, China
ICML 2026
Motivation of DecomPose
Motivation of DecomPose. (a) Category-wise complexity ranking derived from AG-Pose (Lin et al., 2024) evaluation scores, ordering categories from complex to simple using the 5°2cm metric. (b) Cross-category optimization contention: mismatched modeling demands lead to gradient conflicts, while asynchronous convergence induces negative transfer, as gradients from hard categories continually perturb parameters preferred by easy ones during training.

Abstract

Category-level 6D object pose estimation is typically formulated as a multi-category joint learning problem with fully shared model parameters. However, pronounced geometric heterogeneity across categories entangles incompatible optimization signals in shared modules, resulting in gradient conflicts and negative transfer during training. To address this challenge, we first introduce gradient-based diagnostics to quantify module-level cross-category contention. Building on these diagnostics, we propose DecomPose, a difficulty-aware decomposition framework that mitigates optimization contention via: (1) difficulty-aware gradient decoupling, which groups categories using a data-driven difficulty proxy and routes each instance to a group-specific correspondence branch to isolate incompatible updates; and (2) stability-driven asymmetric branching, which assigns higher-capacity branches to structurally simple categories as stable optimization anchors while constraining complex categories with lightweight branches to suppress noisy updates. Extensive experiments on REAL275, CAMERA25, and HouseCat6D demonstrate that DecomPose effectively reduces cross-category optimization contention and delivers superior pose estimation performance across multiple benchmarks.

Method

Method of DecomPose
The proposed DecomPose mainly consists of two components. First, in the shared canonical feature extraction and difficulty-aware grouping phase, point-wise geometric and semantic features are jointly encoded into a shared canonical representation, while object categories are routed into difficulty-specific groups via a proxy-based routing strategy to mitigate cross-category optimization contention. Subsequently, in the asymmetric correspondence and pose recovery phase, the grouped features are processed by correspondence branches with heterogeneous capacities according to their difficulty levels, and their outputs are aggregated by a unified pose head for end-to-end 6D pose estimation.

Experiment

Table 1. Comparison with state-of-the-art methods on CAMERA25, REAL275, and HouseCat6D. Best results are highlighted in bold (red), and second-best results are underlined (blue). ‘-’ indicates that results are not reported.
Table 1. Comparison with state-of-the-art methods on CAMERA25, REAL275, and HouseCat6D.
Additional experimental results

BibTeX

@misc{gao2026decomposedisentanglingcrosscategoryoptimization,
  title={DecomPose: Disentangling Cross-Category Optimization Contention for Category-Level 6D Object Pose Estimation},
  author={Yifan Gao and Lu Zou and Zhangjin Huang and Guoping Wang},
  year={2026},
  eprint={2605.15728},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2605.15728}
}