We introduce MeshPose, a method to jointly tackle the 2D DensePose (DP) and 3D Human Mesh Reconstruction (HMR) problems. We start by observing that HMR methods have high image reprojection error, as measured by DensePose localization metrics, while the more pixel-accurate DensePose methods do not provide a 3D mesh. In our work we combine DensePose and HMR representations, losses, and datasets, resulting in a unified system that performs competitively on both tasks. We first introduce SparsePose, a method to localize in 2D a sparse set of ‘low-poly’ body mesh vertices. For this we propose new geometry-motivated losses that allow us to use the existing DensePose ground-truth as weak supervision for SparsePose. MeshPose then lifts these 2D vertices with per-vertex depth prediction, yielding a low-poly 3D body mesh. We train the resulting system in an end-to-end manner, resulting in the first HMR system with competitive DensePose accuracy.
SparsePose first extracts multiple per-vertex heatmaps. These are used in two ways: spatial argsoftmax across the x and y axis localizes vertices in 2D, while channel-wise aggregation of scores provides per-pixel UV values. MeshPose complements these 2D localized vertices with a skeleton-driven depth regressor. We use integral depth regression to estimate the skeleton joint depths and estimate per-vertex depth offsets to the parent joint depths. This results in a 3D body mesh that is both accurate in 3D and correctly reprojects to the image domain.