a.k.a. CV
Lecture on Computer Vision (Master of Information Science)
-
Definition of “CV,” “CG,” and “Image Processing”
- CV is the process of extracting “shape/appearance/motion/meaning” from images/videos.
- The opposite of CV is Computer Graphics (CG), which creates “shape/appearance/motion/meaning” from images/videos.
- Image Processing involves transforming images/videos into different images/videos, such as changing colors.
-
(blu3mo) This definition seems slightly different from the definition in the book “Digital Image Processing.”
- The image posted on the Image Processing page presents a slightly different definition.
-
The difficulty of CV is that it is not easily understood by the general public.
- It is difficult to convey the difficulty of understanding that a pet bottle in front of you is indeed a pet bottle.
- (blu3mo) There is also a perspective of Programming Education that conveys this sense of understanding.
- What computers can/cannot do.
-
Topics
- Feature point detection and matching
- Shape reconstruction from motion (generating point cloud data from multiple images)
- Computational Photography (photography)
- 3D reconstruction
- Image recognition (YOLO, etc.)
- Recently, Fairness (addressing issues where AI exhibits racial bias) is also being discussed.
-
Image recognition
-
Combination: Machine Learning x Computer Vision x Natural Language Processing
-
Object detection, semantic segmentation, etc.
-
History
- SIFT: Local features using the vector of histograms of brightness gradients
- Bag of Visual Words: Applying the technique of Bag of Words in Natural Language Processing
- Image datasets: Caltech-101
-
Methods (basic ones)
-
Conversion to one-hot vectors
-
Improvements can be made by using images (kernels) that are suitable for recognition instead of partial image data.
-
Instead of one-hot vectors, more complex representations can be used.
- It feels similar to the lineage of NLP.
-
The idea is to compress the width and height dimensions of the image while expanding the depth dimension.
- Hierarchical structure
- (blu3mo) The same idea as the diagram in the section on CNN.
-
-
-
(blu3mo) As long as we use datasets collected by humans, image recognition will adapt and overfit to human cognition.
- Well, that’s the purpose.
- It becomes a philosophical discussion about whether we can recognize the “thing itself” through image recognition.
- It is obvious that it cannot be recognized without human labeling.
- What if we deliberately ignore the existence of humans and try to achieve object detection (or something similar)?
- Is that Unsupervised Learning, or can we achieve object detection without supervision?
- We cannot label using natural language.
- Can we create an intelligent way of perceiving visual information that is not limited to human vision?
- Can we reach a level that can be called intelligent?
- In other words, can we call it intelligent if humans cannot understand it?
- Is that Unsupervised Learning, or can we achieve object detection without supervision?
#computer Science