Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Recent Post


Friday, 6 July 2018

Feature Selection in Machine Learning

Feature Reduction :-

The information about the target class inherent in the variables.

Native view :

More features
⇒ More information
⇒ More better discrimination power

In practice :
- many reasons why this is not the case!

Course of Dimensionality

number of training examples is fixed
 - the classifier's performance usually will degrade for a large number of features !

Feature Selection :-

Given a set of features F = {𝓍1,........𝓍n}
the Feature Selection problem is to find a subset F' ⊆ F that maximizes the learners ability to classify patterns.
Formally F' should maximize some scoring function
 π“1   → 𝓍i1
 π“2   → 𝓍i2
  .             .
  .             .
  .             .
  𝓍n  → 𝓍in

Feature Selection  Steps

Feature selection is an optimization problem
Step 1 : Search the space of possible feature subset.
Step 2 : Pick the subset that is optimal or near-optimal with respect to some objective function.

Search strategies
 - Optimum
 - Heuristic
 - Randomized

Evaluation strategies
 - Filter methods
 - Wrapper methods

Evaluating feature subset

Supervised (Wrapper method)
 - Train using selected subset
 - Estimate error on validation dataset

Unsupervised (Filter method)
 - Look at input only
 - Select the subset that has the most information

Forward Selection
- Start with empty feature set
- Try each remaining feature
- Estimate classification/reg. error for adding each feature
- Select feature that given maximum improvement
- Stop when there is no significant improvement

Backward Search
- Start with full feature set
- Try remaining feature
- Drop the feature with smallest impact an error

Univariate (looks at each feature independently of others)
- Person correlation coefficient
- F-score
- Chi-square
- Signal to noise ration
- mutual information
- Etc.

Rank features by importance
Ranking cut-off is determined by user

Person correlation coefficient

- Measures the correlation between two variables
- Formula for person correlation = 
- The correlation r is between +1 and -1.
  •   +1 means perfect positive correlation
  •   - 1 in the other direction 

Signal to noise ratio

- Difference in means divided by difference in standard deviation between the two classes
                    S2N(X,Y) = (ΞΌx - ΞΌy) / (Οƒx - Οƒy)
- Large values indicate a strong correlation

Multivariate feature selection

- Multivariate (consider all features simultaneously)
- Consider the vector w for any linear classifier.
- Classification of a point x is given by wtx+w0.
- Small entries of w will have little  effect on the dot product and therefore those features are less relevant
- For example if w = (10, 0.1, -9) then features 0 and 2 are contributing more to the dot product than feature 1.
          - A ranking of features given by this w is 0,2,1. 
- The w can be obtained by any of linear classifiers
- A variant of this approach is called recursive feature elimination.
     - Compute w on all features
     - Remove feature with smallest wi
     - Recompute w on reduced data
     - If stopping criterion not met then go to step 2


  1. ExcelR is a glad accomplice of University Malaysia Sarawak (UNIMAS), Malaysia's first state funded college and positioned eighth top college in Malaysia and positioned among top 200th in Asian University Rankings 2017 by QS World University Rankings. data science course in pune