**Feature Reduction :-**

The information about the target class inherent in the variables.

Native view :

More features

⇒ More information

⇒ More better discrimination power

In practice :

- many reasons why this is not the case!

**Course of Dimensionality**

number of training examples is fixed

- the classifier's performance usually will degrade for a large number of features !

**Feature Selection :-**

Given a set of features F = {𝓍1,........𝓍n}

the Feature Selection problem is to find a subset F' ⊆ F that maximizes the learners ability to classify patterns.

Formally F' should maximize some scoring function

𝓍1 → 𝓍i1

𝓍2 → 𝓍i2

. .

. .

. .

𝓍n → 𝓍in

**Feature Selection Steps**

Feature selection is an optimization problem

Step 1 : Search the space of possible feature subset.

Step 2 : Pick the subset that is optimal or near-optimal with respect to some objective function.

Search strategies

- Optimum

- Heuristic

- Randomized

Evaluation strategies

- Filter methods

- Wrapper methods

**Evaluating feature subset**

**Supervised (Wrapper method)**

- Train using selected subset

- Estimate error on validation dataset

**Unsupervised (Filter method)**

- Look at input only

- Select the subset that has the most information

**Forward Selection**

- Start with empty feature set

- Try each remaining feature

- Estimate classification/reg. error for adding each feature

- Select feature that given maximum improvement

- Stop when there is no significant improvement

**Backward Search**

- Start with full feature set

- Try remaining feature

- Drop the feature with smallest impact an error

Univariate (looks at each feature independently of others)

- Person correlation coefficient

- F-score

- Chi-square

- Signal to noise ration

- mutual information

- Etc.

Rank features by importance

Ranking cut-off is determined by user

**Person correlation coefficient**

- Measures the correlation between two variables

- Formula for person correlation =

- The correlation r is between +1 and -1.

- +1 means perfect positive correlation
- - 1 in the other direction

**Signal to noise ratio**

- Difference in means divided by difference in standard deviation between the two classes

S2N(X,Y) = (μx - μy) / (σx - σy)

- Large values indicate a strong correlation

**Multivariate feature selection**

- Multivariate (consider all features simultaneously)

- Consider the vector w for any linear classifier.

- Classification of a point x is given by wtx+w0.

- Small entries of w will have little effect on the dot product and therefore those features are less relevant

- For example if w = (10, 0.1, -9) then features 0 and 2 are contributing more to the dot product than feature 1.

- A ranking of features given by this w is 0,2,1.

- The w can be obtained by any of linear classifiers

- A variant of this approach is called

__recursive feature elimination__.

- Compute w on all features

- Remove feature with smallest wi

- Recompute w on reduced data

- If stopping criterion not met then go to step 2

ExcelR is a glad accomplice of University Malaysia Sarawak (UNIMAS), Malaysia's first state funded college and positioned eighth top college in Malaysia and positioned among top 200th in Asian University Rankings 2017 by QS World University Rankings. data science course in pune

ReplyDelete