For many difficult jobs in computer science, generative models are increasingly being used as the standard approach. They stand for one of the most promising approaches to visual data analysis and synthesis. The most well-known generative model for creating lovely and realistic graphics from a challenging input challenge is stable diffusion. Diffusion Models (DMs), which have demonstrated incredible creative capacity for images and movies, are the foundation of the architecture. A revolution in the production of 2D content is being sparked by the quick developments in diffusion and generative modeling. The adage “If you can explain it, you can envision it” or, better still, “If you can describe it, the model can paint it for you” is fairly straightforward. The abilities of generative models are quite amazing.
While it has been demonstrated that 2D content puts DMs under stress, 3D content also presents a number of difficulties because of the additional dimension. Given the memory and processing costs, which might be prohibitive for producing the rich features required for high-quality avatars, it is challenging to create 3D content, such as avatars, with the same quality as 2D content.
Allowing anyone to build a digital avatar might be advantageous because technology is driving the use of digital avatars in movies, video games, the metaverse, and the 3D sector. That is the inspiration behind this project’s development.
The Roll-out diffusion network (Rodin) is the solution the authors provide for the problem of developing a digital avatar. Following is a visual representation:
An image, a stream of noise, or a text description of the desired avatar can all be used as input to the model. The diffusion process then uses the latent vector z that was created from the input. There are many noise-denoising processes in the diffusion process. To get a significantly sharper image, random noise is first applied to the initial state or starting state of the image.
The 3D nature of the desired content is what makes this different. Instead of aiming for a 2D image, the diffusion model creates the avatar’s coarse geometry, which is then processed by a diffusion up sampler for detail synthesis.
The efficiency of memory and computation is one of the goals of this effort. In order to do this, the authors took advantage of the neural radiance field’s tri-plane (three axis) representation, which has a significantly smaller memory footprint than voxel grids without sacrificing expressivity.
The created tri-plane representation is then upsampled using a different diffusion model that has been trained to meet the desired resolution. Finally, an RGB volumetric image is produced using a simple MLP decoder with 4 completely connected layers.
Following are the results
Rodin offers the most accurate digital avatars as compared to the previously described cutting-edge strategies. Contrary to the previous methodologies, the model shows no artifacts in the shared samples. For detail information visit https://www.marktechpost.com/2023/01/01/meet-rodin-a-novel-artificial-intelligence-ai-framework-to-generate-3d-digital-avatars-from-various-input-sources/
And to know about the latest application which create 2d avatars, which is becoming quite famous in the social media now a day visit