HyperNeRF

SIGGRAPH_Asia_2021

https://hypernerf.github.io/ https://arxiv.org/pdf/2106.13228.pdf

Below is a summary of the paper I read.

  • Abstract

    • There are various extension works for performing NeRF on dynamic scenes (objects moving).

      • However, they struggle with types of changes in topology.
      • This problem was solved by elevating NeRF to a higher dimension.
        • By increasing the dimension, even those with different topology can be handled.
        • This is called hyper-space.
    • Goals (tasks for evaluation)

        - NeRF doesn't do this.
      
        - NeRF also does this.
      
      • In short, we want to be able to generate (interpolate) images that did not originally exist, even if we change the position of the t-axis or the position of the x/y/z/viewpoint.
    • And, they can achieve lower error rates and better results than the previous Nerfies.

  • Introduction

    • Real-world objects undergo changes, including topological changes.
      • For example, objects breaking.
      • Changes in facial expressions (such as opening and closing the mouth) also involve topological changes if you think about it.
    • These changes are not continuous (there are discontinuous timings where the topology changes).
      • So it was difficult to handle them with existing algorithms for interpolating between scenes.
    • As a solution, there is something called the Level Set Method.
    • This paper is, in a nutshell, NeRF x Level Set Method.
    • Differences from classical Level Set Method
      • Classical methods only increase one dimension, but HyperNeRF can go to any number of dimensions.
      • They talk about increasing the ambient dimension.
      • They don’t limit it to the Euclidean space, even though I don’t understand topology.
      • They represent non-Euclidean things with Neural Networks.
    • Hyperdimentional NeRF becomes HyperNeRF.
    • Instead of regularization, they use an optimization strategy.
      • I don’t understand it well.
      • Maybe it means there is less manual intervention?
  • Related Works

    • Non-rigid reconstruction
      • There are methods that use multi-lens or depth sensors (LIDAR and others), but the device setup is difficult.
      • They also mention existing methods when there is only one lens, but I don’t understand the mechanism, so I can’t understand the problems (blu3mo).
      • HyperNeRF solves the problem of topology changes that were present in Nerfies.
    • Neural Rendering
      • Around 2019, there were various studies on training neural networks to generate images from images.
        • However, there was a problem of inconsistency when generating images from various viewpoints.
      • After that, neural networks like NeRF emerged to represent scenes themselves.
        • This can maintain geometric consistency.- One problem with NeRF is that it struggles with representing moving objects.
    • That makes sense.
    • There are two approaches to solve this problem:
      • Deformation-based approach:
        • Represents moving objects using a continuous deformation field.
        • Similar to the Radiance Field, the Deformation Field is also approximated.
        • It has a weakness in that it cannot represent topological changes or transient effects like fire.
      • Modulation-based approach:
        • Uses a latent code.
        • Not much is understood about the mechanism.
        • Can cover topological changes and other effects.
      • HyperNeRF is an approach that combines both of these approaches.
        • It models the scene changes using a deformation field.

Summary from the website:

  • Motivation:
    • Inspired by the concept of the Level Set Method.
      • This method treats something that changes (e.g., a 2D shape) as slices of a higher-dimensional object (e.g., a 3D shape).
      • Link to video
  • Architecture:
    • image