Motivation.
Live3D Portrait (Live3D) proposes a real-time 3D inversion method based on the two-branch structure.
The figure below demonstrate the disentanglement in Live3D features.
We separately disable the features from the two branches E
high(·) and E
low(·) to infer the reconstructed image.
Without E
high(·), the output retains the coarse structure but loses its appearance. Conversely,
when E
low(·) is deactivated, the reconstructed portraits preserve the texture (such as the blue and
purple reflection on the glasses) but fail to capture the geometry.
Framework.
Inspired by the aforementioned feature disentanglement, we propose to distill the priors in the 2D diffusion generative model and 3D GAN for real-time 3D-aware editing.
The proposed model is fine-tuned from Live3D where the prompt features are fused with ones from E
high(·) through cross-attention,
in order to further predict the triplane representation.
Comment: Proposes a hybrid explicit-implicit network that synthesizes high-resolution multi-view-consistent images in real time and also produces high-quality 3D geometry.