• I will research to understand HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields.

  • Explanatory Slides on NeRF: Representing Scenes as Neural Radiance Fields for View…

    • Explanation
  • Neural Representation of 3D Space and NeRF | ALBERT Official Blog

    • Explanation
    • Includes an explanation of the lineage of related technologies and prerequisite knowledge such as the mathematical model of cameras.
    • Radiance Field
      • This is like a space where volume density and emitted radiance are associated with each coordinate.
      • It can be understood as training a neural network to approximate this.
    • image
      • Integrating color (c) and density (σ) between t_n (nearest) and t_f (farthest)
        • At that time, we want to attenuate the color and density of the density at position t_n (nearest) to position n, so we multiply by T(t).
      • The discretized version of this can be represented by an operator that can be backpropagated, allowing the neural network to be trained.
    • Fine-tuning
      • A technique that separates coarse and fine rendering
        • First, perform coarse rendering with a coarse division of t
        • Then, perform fine rendering in places where objects seem to exist
        • Train and render with two separate neural networks
      • Positional encoding
        • The color and density of each point, {\bf c} and \sigmac,σ, change rapidly with small changes in {\bf x, d}x,d. However, neural networks are not good at approximating such high-frequency functions.

          • I can understand this to some extent.
          • Graphs that are oscillating are difficult to approximate.
        • Therefore, in this paper, the authors use a technique called positional encoding to embed {\bf x, d}x,d into a high-dimensional space. If this embedding representation is high-frequency, the neural network only needs to approximate low-frequency functions.

          • Hmm?
          • image
          • Instead of increasing the dimensions of the 2/3-dimensional vector x, d, we lower the frequency of each dimension.
          • By putting p into many sine/cosine functions with different phases, we can have enough information due to the number of dimensions.
            • It’s a technique used in Transformer.
            • It’s difficult to understand this without a solid foundation.
          • Explanatory Slides on NeRF: Representing Scenes as Neural Radiance Fields for View…
            • It seems to be a transformation to learn high-frequency components as well.
            • I can understand the intention behind it.
      • Experiment
        • They experiment with various parameters such as the division of positional encoding L and the granularity of quadrature method, and observe how the computational cost and quality change.
          • I see, this is what researchers in the field of neural networks do.
  • Abstract

    • Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.

  • https://www.matthewtancik.com/nerf

    • Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location.

      • Input
        • 3D coordinates
        • Viewing direction
          • This can handle objects that change color and brightness depending on the viewing angle (e.g., specular reflections).
      • Output
        • Volume density: This corresponds to the voxel transparency in volume rendering.- Emitted radiance: It refers to the brightness in a certain direction.
  • General article

    • VR attracts attention, the impact of new technology “NeRF” beautifully synthesizing images from various perspectives: Nikkei Cross Trend

    • Specifically, it is a neural network that takes as input the position (x, y, z) of a certain 3D point and the direction of the line of sight (θ, Φ), and outputs the brightness and opacity of that point (refer to Figure 2 (a), (b)). For example, when the coordinates of a point on the object surface and the direction of the line of sight are input to the trained NeRF, it outputs the brightness and opacity corresponding to that position. Similarly, when the coordinates of a point in the air where there is no object and the direction of the line of sight are input, it outputs the information that the position is colorless and transparent. In this way, the basic idea of NeRF is to synthesize images using a “scene” that takes into account brightness and opacity, which is an original point of NeRF.

    • There is a task called “Novel View Synthesis” that synthesizes images from multiple viewpoints to create images from new viewpoints. This is an essential technology for VR and free-viewpoint videos in sports, among others.

    • The method called “NeRF” introduced in this article has achieved astonishing performance in Novel View Synthesis. NeRF is not only known for the overwhelming beauty of the synthesized images, but also for its algorithm, which is completely different from previous research and very innovative. It is a highly interesting study with many highlights. It would be great if this article could convey some of its interesting aspects.

    • In short,

      • One scene is represented by one neural network.
      • The neural network takes the position and direction of the line of sight as input, and outputs the brightness and opacity.
        • If this information can be learned, the image can be generated using classical volume rendering.
  • Video

    • https://www.youtube.com/watch?v=JuH79E8rdKc&t=4s
    • This video is incredibly well-made.
      • It carefully explains the differences from existing methods.
    • Wow, this is amazing.
    • Reflections are also rendered accurately.
    • The distortion of light caused by glass is also depicted.