aka Principal Component Analysis (PCA)

Unsupervised Learning

  • Find the direction in which the data varies the most and use it as the base axis.

  • Find the direction in which the data varies the most with respect to that axis and use it as the second axis.

    • Repeat this process for a total of {number of dimensions} times.
  • This will result in new axes arranged in order of variability.

  • By plotting the data in a 2D plot using the first and second axes of highest variability, the cancer example looks like this.

    • This is based on the hypothesis that the most variable feature is the most meaningful feature.
      • If the hypothesis is correct, the data will be well classified.
  • Think of it as sorting the dimensions by importance.

  • Applications: feature extraction, visualizing data in a single graph.

    • Feature extraction: Data can have a form that is more suitable for visualization than its original form.
  • image

  • Challenge: Humans cannot understand the meaning of the transformed axes.