Massive virtual worlds created by a growing number of companies and creators could be more easily populated with a wide range of buildings, vehicles, 3D characters, and more, thanks to a new AI model of NVIDIA Research.
Formed using only 2D images, NVIDIA GET3D generates 3D shapes with high fidelity textures and intricate geometric details. These 3D objects are created in the same format used by popular graphics software applications, allowing users to immediately import their shapes into 3D renderers and game engines for further editing.
Generated objects could be used in 3D representations of buildings, outdoor spaces or entire cities, designed for industries such as games, robotics, architecture and social media.
GET3D can generate a virtually unlimited number of 3D shapes based on the data it is trained on. Like an artist turning a piece of clay into a detailed sculpture, the model transforms numbers into complex 3D shapes.
With a training dataset of 2D car images, for example, it creates a collection of sedans, trucks, race cars, and vans. When trained on animal images, it features creatures such as foxes, rhinos, horses, and bears. Given the chairs, the model generates an assortment of comfortable swivel chairs, dining chairs, and recliners.
“GET3D brings us closer to democratizing AI-powered 3D content creation,” said Sanja Fidler, vice president of AI research at NVIDIA, who leads the Toronto-based AI lab who created the tool. “Its ability to instantly generate textured 3D shapes could be a game-changer for developers, helping them quickly populate virtual worlds with varied and interesting objects.”
GET3D is one of more than 20 NVIDIA-authored papers and workshops accepted into the NeurIPS AI conference, taking place in New Orleans and virtually, November 26 through December 26. 4.
It takes some kind of AI to create a virtual world
The real world is full of variety: the streets are lined with unique buildings, with different vehicles speeding by and diverse crowds passing by. Manually modeling a 3D virtual world that mirrors this is extremely time-consuming, making it difficult to populate a detailed digital environment.
Although faster than manual methods, earlier 3D generative AI models were limited in the level of detail they could produce. Even recent reverse rendering methods can only generate 3D objects based on 2D images taken from different angles, forcing developers to create one 3D shape at a time.
GET3D can instead generate around 20 shapes per second when running inference on a single NVIDIA GPU – running like a generative adversarial network for 2D images, while generating 3D objects. The larger and more diverse the training dataset, the more varied and detailed the output.
NVIDIA researchers trained GET3D on synthetic data consisting of 2D images of 3D shapes captured from different camera angles. It only took the team two days to train the model on around 1 million images using NVIDIA A100 Tensor Core GPU.
Allow creators to modify shape, texture and material
GET3D takes its name from its ability to gannoy Eexplicit Jexturated 3D mesh – meaning the shapes it creates come in the form of a triangular mesh, like a papier-mâché model, covered in a textured material. This allows users to easily import the objects into game engines, 3D modelers and movie renderers – and edit them.
Once creators export the shapes generated by GET3D to a graphics application, they can apply realistic lighting effects as the object moves or rotates in a scene. By incorporating another AI tool from NVIDIA Research, StyleGAN-NADAdevelopers can use text prompts to add a specific style to an image, such as modifying a rendered car to become a burnt-out car or a taxi, or turning an ordinary house into a haunted house.
The researchers note that a future version of GET3D could use camera pose estimation techniques to allow developers to train the model on real-world data instead of synthetic datasets. It could also be enhanced to support universal generation, meaning developers could train GET3D on all sorts of 3D shapes at once, rather than having to train it on one category of objects at a time. .