Learning Decision Tree in Machine Learning

Decision tree learning is one of the predictive modelling approaches used in statistics, data mining and machine learning. It uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Principled Criterion
Selection of an attribute to test at each node- choosing the most useful attribute for classifying examples.

Information gain
- measures how well a given attribute separates the training examples according to their target classification.
- This measure is used to select among the candidate attributes at each step while growing the tree.
- Gain is measure of how we can reduce uncertainty (Value lies between 0,1).

Entropy :-

A measure for -
- uncertainty
- purity
- information content

Information theory: optimal length code assigns (-log2p) bits to message having probability p
S is a sample of training examples
- p+ is the proportion of positive examples in S
- P_ is the proportion of negative examples in S
Entropy of S : average optimal number of bits to encode information about certainty/uncertainty about S

S is a sample of training examples
p+ is the proportion of positive of positive example
p_ is the proportion of negative examples
Entropy measures the impurity of S
Entropy(S) = -p+log2p+ - p_ log2p_
The entropy is 0 if the outcome is "certain".
The entropy is maximum if we have no knowledge of the system (or any outcome is equally possible).

Information Gain

Gain (S,A): expected reduction in entropy due to partitioning S on attribute A.

Entropy ([21+,5-]) = o.17
Entropy ([8+,30-]) = 0.74
Gain (S,A1) = Entropy(S)
         -26/64*Entropy([21+,5-])
   -38/64*Entropy([8+,30-])
= 0.27

Entropy ([18+,33-]) = o.94
Entropy ([8+,30-]) = 0.62
Gain (S,A2) = Entropy(S)
         -51/64*Entropy([18+,33-])
   -13/64*Entropy([11+,2-])
= 0.12

Training Example :-

Selecting the next Attributes

The information gain values for the 4 attributes are:
Gain(S,Outlook) = 0.247
Gain(S, Humidity) = 0.151
Gain(S,Wind) = 0.048
Gain(S,Temperature) = 0.029

where S denotes the collection of training examples

Splitting Rule: GINI Index

GINI Index
- Measure of node impurity

Splitting Based on Continuous Attributes

Continuous Attribute - Binary Split
   For continuous attribute
   - Partition the continuous value of attribute A into a discrete set of intervals
   - Create a new Boolean attribute Ac , looking for a threshold c,