Decision Trees - How to decide the split?
In the introduction to Decision trees, we have seen that the whole process is to keep splitting one node into two based on certain features and feature values. The idea of the split is to ensure that the subset is more pure or more homogeneous after the split. There are two aspects we need to understand here: The concept of homogeneity or purity - what does it mean? How do we measure purity or impurity? Only then, we can use this for splitting the nodes correctly. Homogeneity