MeshPose: Unifying DensePose and 3D Body Mesh reconstruction

1UCL    2Snap Inc.
CVPR 2024

*Equal contribution

Abstract

DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision to accurately localize in 2D a subset of the mesh vertices ('VertexPose'). We then lift these vertices to 3D, yielding a low-poly body mesh ('MeshPose'). Our system is trained in an end-to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications.

MeshPose Architecture

Meshpose Architecture

The lower VertexPose branch extracts multiple heatmaps from which, by applying the spatial argsoftmax operation, it computes precise x and y coordinates for all the vertices inside the input crop. The upper Regression branch computes the coordinates (x, y, and vertex depth z) for all vertices, along with their visibility scores w. The score w will take lower values when the corresponding vertex is either occluded or fall outside the crop area. We differentiably combine the VertexPose and regressed coordinates via w to get the final 3D mesh. We densely supervise the intermediate per-vertex heatmaps and the final output with UV, mesh and silhouette cues to end up with a low latency, image aligned, in-the-wild HMR system

VertexPose Supervision

VertexPose Supervision

Geometry-driven losses used to supervise VertexPose with DensePose ground-truth. Our barycentric loss requires that the per-pixel distribution over VertexPose matches the UV annotation’s barycentrics. Our UV consistency loss requires that the UV annotation’s barycentrics at a labelled pixel x should recover x based on a similar combination of VertexPose vertices into x̂.

MeshPose Supervision

Meshpose Supervision

Visibility Supervision: We estimate partial vertex visibility based on the available ground-truth: for any (x, u) annotation pair contained in the DensePose dataset, we declare as visible all three vertices that lie on the mesh triangle containing u. We also declare as non-visible every vertex where the mesh supervision is outside the image crop. For such vertices we can supervise visibility based on a standard binary cross-entropy loss.

Branch Fusion: The 2D location of a MeshPose vertex is the visibility-weighted average between VertexPose-based 2D positions and regressed values. The visibility label dictates on a per-vertex level whether we should rely on the VertexPose-based 2D position or fall back to the value regressed at this stage. This allows us to accommodate occluded areas, or tight crops that omit part of the human body, as is regularly the case for selfie images. This differentiable expression allows us to estimate visibility through end-to-end back-propagation, but we also use two additional methods for visibility supervision.

Quantitative Results

We outperform HMR methods on DensePose metrics by more than 50% while having close to state of the art 3D accuracy. By combining the highest FPS rate and small model size with state-of-art reprojection accuracy, our pipeline is well suited for mobile inference.

Quantitative Evaluation

Our models are purely convolutional and thus run out-of-the-box on modern phones with accelerators. We exported the ONNX versions of our models and computed their timings (FPS) on an iPhone-12 using the CoreML backend, obtaining comparable timings to GPU-desktop timings.

Mobile inference

Presentation Video

3DPW Videos

In-the-Wild Videos

We demonstrate very strong temporal stability (low jitter) even when applied frame-by-frame without any post-processing. In the videos below, only the detected bounding box is temporally-smoothed.

BibTeX


        @inproceedings{lekakolyris2024meshpose,
          title={MeshPose: Unifying DensePose and 3D Body Mesh reconstruction},
          author={Eric-Tuan Le\^, Antonis Kakolyris, Petros Koutras, Himmy Tam, Efstratios Skordos, George Papandreou, R\{i}za Alp G\"uler, Iasonas Kokkinos},
          booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
          year={2024}
        }