NVIDIA researchers have developed a rendering framework called DIB-R. This uses 2D images to produce 3D effects. Sounds exciting – but what can it be used for?
Who are NVIDIA?
NVIDIA is a team of over 200 scientists from all over the world. They work in different areas including ‘AI, computer vision, self-driving cars, robotics, and graphics’.
Founded back in 1993 they have built up a history of credibility and invented the GPU (graphics processing unit) in 1999. GPU deep learning is integral to developing modern AI in which they explain that ‘the GPU [is] acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world.’
How can a 2D image become 3D?
The science is all based on how our retinas capture depth. We actually see in 2D, but our brains process the data from each eye to provide contextual depth to enable us to ‘see’ a 3D image. Interesting stuff!
NVIDIA used this understanding to build their rendering framework DIB-R, which is built in the machine learning framework PyTorch.
DIB-R uses an encoder-decoder architecture that transforms data into a map or vector, and this is then used to anticipate the shape, color, and texture of the object.
How long does it take?
This is not like 3D printing. NVIDIA says it takes two days to train the model, whereas without their GPUs it would take weeks. Once trained, DIB-R can produce a 3D object from a 2D image in under 100 milliseconds.
I find it fascinating is that this is another technological advancement that is based on the study of nature; in this case, human eyes. We reported recently about how Harvard scientists had created a new Metalens based on the layers of the eyes in jumping spiders.
Biomimetics continues to amaze me. This is the study of natural actions or constructions and how science and engineering can use this as inspiration.
The NVIDIA researchers will present their new DIB-2 framework at the Conference on Neural Information Processing Systems this week in Vancouver.
How is this capability useful?
It’s all about building gradual improvements in rendering and processing. DIB-R has been added to Kaolin, which is NVDIA’s 3D deep learning library. This is a library of research and frameworks which collectively will accelerate the progress of 3D deep learning.
Traditionally, computers render 3D models into a 2D screen, but having a model that can create a 3D image in reverse offers benefits such as enhanced object tracking. Think of applications like live VR journalism and real-time GPS monitoring where this could offer a leap in functionality.
Robotics is another field of study which will benefit. For robots to be able to perform their functions correctly, they need to understand the space around them. Harnessing the power of DIB-R could enhance the way in which robots move, and improve their depth perception abilities.
Jun Gao, one of the DIB-R researchers, says ‘this is essentially the first time ever that you can take just about any 2D image and predict relevant 3D properties‘.
How do you think this framework will be most useful? Robotics? Gaming? Education?