a.k.a. CV

Lecture on Computer Vision (Master of Information Science)

  • Definition of “CV,” “CG,” and “Image Processing”

    • CV is the process of extracting “shape/appearance/motion/meaning” from images/videos.
    • The opposite of CV is Computer Graphics (CG), which creates “shape/appearance/motion/meaning” from images/videos.
    • Image Processing involves transforming images/videos into different images/videos, such as changing colors.
  • (blu3mo) This definition seems slightly different from the definition in the book “Digital Image Processing.”

    • The image posted on the Image Processing page presents a slightly different definition.
  • The difficulty of CV is that it is not easily understood by the general public.

    • It is difficult to convey the difficulty of understanding that a pet bottle in front of you is indeed a pet bottle.
    • (blu3mo) There is also a perspective of Programming Education that conveys this sense of understanding.
      • What computers can/cannot do.
  • Topics

    • Feature point detection and matching
    • Shape reconstruction from motion (generating point cloud data from multiple images)
    • Computational Photography (photography)
    • 3D reconstruction
    • Image recognition (YOLO, etc.)
    • Recently, Fairness (addressing issues where AI exhibits racial bias) is also being discussed.
  • Image recognition

    • Combination: Machine Learning x Computer Vision x Natural Language Processing

    • Object detection, semantic segmentation, etc.

    • History

      • SIFT: Local features using the vector of histograms of brightness gradients
      • Bag of Visual Words: Applying the technique of Bag of Words in Natural Language Processing
      • Image datasets: Caltech-101
    • Methods (basic ones)

      • Conversion to one-hot vectors

      • Improvements can be made by using images (kernels) that are suitable for recognition instead of partial image data.

      • Instead of one-hot vectors, more complex representations can be used.

        • It feels similar to the lineage of NLP.
      • The idea is to compress the width and height dimensions of the image while expanding the depth dimension.

        • Hierarchical structure
        • (blu3mo) The same idea as the diagram in the section on CNN.
  • (blu3mo) As long as we use datasets collected by humans, image recognition will adapt and overfit to human cognition.

    • Well, that’s the purpose.
    • It becomes a philosophical discussion about whether we can recognize the “thing itself” through image recognition.
      • It is obvious that it cannot be recognized without human labeling.
      • What if we deliberately ignore the existence of humans and try to achieve object detection (or something similar)?
        • Is that Unsupervised Learning, or can we achieve object detection without supervision?
          • We cannot label using natural language.
          • Can we create an intelligent way of perceiving visual information that is not limited to human vision?
            • Can we reach a level that can be called intelligent?
            • In other words, can we call it intelligent if humans cannot understand it?

#computer Science