Evaluation and Cross-Validation

Experimental Evaluation of Learning Algorithms :-

Evaluating the performance of learning systems is important because :
- Learning systems are usually designed to predict the class of "future" unlabeled data points.
Typical choices for performance Evaluation
- Error
- Accuracy
- Precision/Recall
Typical choices for Sampling Methods :
- Train/Test Sets
- K-Fold Cross-validation

Evaluating predictions

Suppose we want to make a prediction of a value for a target feature on example x :
- y is the observed value of target feature on example x.
- Ŷ is the predicted value of target feature on example x.
- How is the error measured?

Sample Error and True Error

The sample error of hypothesis f with respect to target function c and data sample S is :
errors(f) = 1/n ⅀xεsD (f(x),C(x))
The true error (denoted error (f)) of hypothesis f with respect to target function c and distribution D, is the probability that h will misclassify an instance drawn at random according to D.
errors(f) = Pr xεD(f(x)≠C(x))

Difficulties in evaluating hypotheses with limited data

Bias in the estimate : The sample error is a poor estimate of true error
- ==> test the hypothesis on an independent test set
We divide the example into:
- Training examples that are used to train the learner
- Test examples that are used to evaluate the learner
Variance in the estimate : The smaller the test set, the greater the expected variance.

Validation Set

K-fold cross validation

Trade-off

In machine learning, there is always a trade-off between
- complex hypotheses that fit the training data well
- simpler hypotheses that may generalize better.
As the amount of training data increases, the generalization error decreases.