Google AI and Cornell Researchers Introduce DynIBaR: A New AI Method that Generates Photorealistic Free-Viewpoint Renderings from a Single Video of a Complex and Dynamic Scene

Over recent years, there has been remarkable progress in computer vision methodologies dedicated to reconstructing and illustrating static 3D scenes by leveraging neural radiance fields (NeRFs). Emerging approaches have tried to extend this capability to dynamic scenes by introducing space-time neural radiance fields, commonly called Dynamic NeRFs. Despite these advancements, challenges persist in adapting these techniques to analyze videos captured spontaneously in real-world settings.

These methodologies may yield unclear or inaccurate representations when applied to lengthy videos with intricate object motions and unregulated camera trajectories. This limitation constrains their practical applicability in real-world scenarios. While the camera on a cell phone serves as a proficient tool for capturing everyday events, its capability to capture dynamic scenes is restricted.

A team of researchers from Google and Cornell have introduced an innovative AI technique named DynIBaR: Neural Dynamic Image-Based Rendering, notable at CVPR 2023, a prominent conference in computer vision. This method generates highly realistic free-viewpoint renderings from a single video capturing dynamic scenes using a standard phone camera. DynIBaR offers a range of video effects, including bullet time effects (temporarily freezing time while the camera moves at a regular speed around a scene), video stabilization, depth of field adjustments, and slow-motion capabilities.

This technique is scalable to dynamic films with the following characteristics: 1) long periods, 2) wide-ranging scenes, 3) uncontrolled camera trajectories, and 4) quick and complex object motions. Motion trajectory fields that span several frames and are represented by learned basis functions are used to model such motion.

Furthermore, a new temporal photometric loss has been introduced, operating within motion-adjusted ray space to ensure temporal coherence in reconstructing dynamic scenes. To refine the quality of inventive views, the researchers also recommended the incorporation of a novel Image-Based Rendering (IBR)-based motion segmentation technique within a Bayesian learning framework. This segmentation approach effectively separates the scene into static and dynamic components, contributing to an overall enhancement in the rendering quality.

Researchers stored intricate dynamic scenes in a singular data structure by encoding them within the weights of a multilayer perceptron (MLP) neural network. The MLP effectively functions to convert a 4D space-time point (x, y, z, t) into RGB color and density values, which are crucial for rendering images. However, the challenge arises from the fact that the number of parameters in an MLP increases with the duration and complexity of the scene. This computational complexity poses challenges in training models, making it infeasible to train them on videos captured spontaneously in real-world settings. Consequently, renderings produced by approaches like DVS and NSFF may exhibit haziness and imprecision.

Researchers said that a major component of DynIBaR has been used: there is no need to keep every picture detail in a huge MLP (Multilayer Perceptron). Instead, they have directly utilized the pixel data from surrounding frames in the incoming video to construct new views. IBRNet, an image-based rendering method created for synthesizing views in static settings, is the foundation for DynIBaR.

If you like our work, you will love our newsletter..

Rachit Ranjan is a consulting intern at MarktechPost . He is currently pursuing his B.Tech from Indian Institute of Technology(IIT) Patna . He is actively shaping his career in the field of Artificial Intelligence and Data Science and is passionate and dedicated for exploring these fields.