Why Decision Trees?
As we saw in the last article introducing Decision Trees, decision trees can be used for classification or regression. But the same can be achieved with linear and logistic regression too. They are well tried and tested and are pretty useful in most situations, then why do we need decision trees?
There are a few characteristics of Decision Trees that makes them stand out as useful algorithms in specific situations.
1. They are highly interpretable.
If a patient falls in the last red box on the right and is diagnosed as diabetic, you know why. You can explain that this is a male and that his fasting sugars are around 180 and hence a diabetic, with a high probability.
Interpretability is an important need for many organizations to go with an algorithm. If something goes wrong with a business decision, you at least should be able to explain, what went wrong and correct it. If the interpretability is not present, many top stakeholders may be squirmy about accepting your algorithms.
2. It is a versatile algorithm. It can be used for classification or regression problems. We know that classification is used when the target variable is discrete and we use regression when it is continuous. So, in decision trees, we could check the purity or impurity based on the homogeneity of the class of the node for classification but could use something like the sum of squared errors (SSE) to find the lowest SSE point for a split in regression. i.e. by just changing the measure of purity, both classes can be handled.
3. Decision trees handle multi-colinearity better than linear regression does. In fact, multi-colinearity does not matter here. We cannot interpret the linear regression well if multi-colinearity is not handled. But that is not the case here. Decision trees have no impact if the data is multi-colinear
4. Building the tree with splits is pretty fast and works well on large datasets too.
5. It is also scale-invariant. Unlike in linear regression, the importance is not given based on the varying scales. The values are compared only within an attribute and hence without scaling, you can use data for decision trees. You can refresh your memory on Feature Scaling in my earlier article.
6. Another important advantage of decision trees is that it can work with data that has a non-linear relationship between predictors and the target variable. It will partition that data into subsets that are probably linear. Hence creating a sufficient number of splits helps in dealing with non-linear relationships.
So it has carved its own niche in solving problems because of these advantages.
However, it has certain disadvantages too and that needs to be kept in mind as well when finally deciding to go with it or not
Decision Trees can create overly complex trees that lead to overfitting
They are also said to be an unstable model as they can vary largely with even small variations in training data
Also, they are not good for extrapolation. They can work well within the range of data used in the training
Decision trees can also create biased trees if the data has one class dominating. Hence the data set has to be balanced before fitting a decision tree
I plan to get into more details on building Decision Trees in the upcoming articles.