Bitcoin

Disentangled Motion Representation: Encoding Full-Body Avatars into Discrete Latent Spaces

Abstract and 1. Introduction

  1. Related Work

    2.1. Motion Reconstruction from Sparse Input

    2.2. Human Motion Generation

  2. SAGE: Stratified Avatar Generation and 3.1. Problem Statement and Notation

    3.2. Disentangled Motion Representation

    3.3. Stratified Motion Diffusion

    3.4. Implementation Details

  3. Experiments and Evaluation Metrics

    4.1. Dataset and Evaluation Metrics

    4.2. Quantitative and Qualitative Results

    4.3. Ablation Study

  4. Conclusion and References

\
Supplementary Material

A. Extra Ablation Studies

B. Implementation Details

3.2. Disentangled Motion Representation

In this section, our objective is to disentangle full-body human motions into upper-body and lower-body parts and encode them to discrete latent spaces. This can effectively reduce the complexity and burden of encoding since each encoding takes care of only half-body motions.

\

\
Figure 2. The overall architecture of our SAGE Net. It mainly contains two components: (a) Disentangled VQ-VAE for discrete human motion latent learning. To facilitate visualization, we incorporate zero rotations as padding for the lower body in the Upper VQ-VAE, and vice versa for the Lower VQ-VAE. Consequently, in the visualizations of the Upper VQ-VAE, the lower body remains in a stationary pose, whereas in the visualizations of the Lower VQ-VAE, the upper body is maintained in a T-pose. (b) The stratified diffusion model, which models the conditional distribution of the latent space for upper and lower motion. This model sequentially infers the upper and lower body latents, capturing the correlation between upper and lower motions. By employing a dedicated full-body decoder on the concatenated upper and lower latents, we can obtain full-body motion.

\
Since continuous latent from all data samples share the same codebook C, all the real motions in the training set could be expressed by a finite number of bases in latent space.

\

\

:::info
Authors:

(1) Han Feng, equal contributions, ordered by alphabet from Wuhan University;

(2) Wenchao Ma, equal contributions, ordered by alphabet from Pennsylvania State University;

(3) Quankai Gao, University of Southern California;

(4) Xianwei Zheng, Wuhan University;

(5) Nan Xue, Ant Group (xuenan@ieee.org);

(6) Huijuan Xu, Pennsylvania State University.

:::


:::info
This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button