Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Recent Post

Search

Friday, 6 July 2018

Feature Selection in Machine Learning


Feature Reduction :-

The information about the target class inherent in the variables.

Native view :

More features
⇒ More information
⇒ More better discrimination power

In practice :
- many reasons why this is not the case!

Course of Dimensionality

number of training examples is fixed
 - the classifier's performance usually will degrade for a large number of features !



Feature Selection :-

Given a set of features F = {𝓍1,........𝓍n}
the Feature Selection problem is to find a subset F' ⊆ F that maximizes the learners ability to classify patterns.
Formally F' should maximize some scoring function
 π“1   → 𝓍i1
 π“2   → 𝓍i2
  .             .
  .             .
  .             .
  𝓍n  → 𝓍in

Feature Selection  Steps

Feature selection is an optimization problem
Step 1 : Search the space of possible feature subset.
Step 2 : Pick the subset that is optimal or near-optimal with respect to some objective function.




Search strategies
 - Optimum
 - Heuristic
 - Randomized

Evaluation strategies
 - Filter methods
 - Wrapper methods

Evaluating feature subset

Supervised (Wrapper method)
 - Train using selected subset
 - Estimate error on validation dataset

Unsupervised (Filter method)
 - Look at input only
 - Select the subset that has the most information



Forward Selection
- Start with empty feature set
- Try each remaining feature
- Estimate classification/reg. error for adding each feature
- Select feature that given maximum improvement
- Stop when there is no significant improvement

Backward Search
- Start with full feature set
- Try remaining feature
- Drop the feature with smallest impact an error


Univariate (looks at each feature independently of others)
- Person correlation coefficient
- F-score
- Chi-square
- Signal to noise ration
- mutual information
- Etc.

Rank features by importance
Ranking cut-off is determined by user


Person correlation coefficient

- Measures the correlation between two variables
- Formula for person correlation = 
- The correlation r is between +1 and -1.
  •   +1 means perfect positive correlation
  •   - 1 in the other direction 


Signal to noise ratio

- Difference in means divided by difference in standard deviation between the two classes
                    S2N(X,Y) = (ΞΌx - ΞΌy) / (Οƒx - Οƒy)
- Large values indicate a strong correlation

Multivariate feature selection

- Multivariate (consider all features simultaneously)
- Consider the vector w for any linear classifier.
- Classification of a point x is given by wtx+w0.
- Small entries of w will have little  effect on the dot product and therefore those features are less relevant
- For example if w = (10, 0.1, -9) then features 0 and 2 are contributing more to the dot product than feature 1.
          - A ranking of features given by this w is 0,2,1. 
- The w can be obtained by any of linear classifiers
- A variant of this approach is called recursive feature elimination.
     - Compute w on all features
     - Remove feature with smallest wi
     - Recompute w on reduced data
     - If stopping criterion not met then go to step 2

No comments:

Post a Comment