• The decision tree chooses the splitting point where the impurity is maximized (which can be calculated).

    • The impurity is calculated using measures such as error rate, entropy, and Gini coefficient.
  • Pruning is done to prevent overfitting.

Advantages:

  • It can handle data of various scales and types.
  • It is highly adaptable.
  • Even beginners can create easily understandable trees.

Disadvantages:

  • It cannot do anything outside the predetermined scope.
  • image
    • Tree predictions cannot handle anything outside the scope.
  • It is prone to overfitting.

In random forests, many trees are created by randomly omitting data and features, and the majority vote is taken to achieve high accuracy.

  • This also helps to avoid overfitting.
  • It is the most popular method for regression and classification.
  • However, the interpretability, which is a benefit of trees, is reduced.

There is also gradient boosting regression trees, which have more parameters but higher performance.

  • It combines many small trees with pre-pruning.

Getting Started with Machine Learning in Python