**Feature Reduction :-**

The information about the target class inherent in the variables.

Native view :

More features

⇒ More information

⇒ More better discrimination power

In practice :

- many reasons why this is not the case!

**Course of Dimensionality**

number of training examples is fixed

- the classifier's performance usually will degrade for a large number of features !

**Feature Selection :-**

Given a set of features F = {𝓍1,........𝓍n}

the Feature Selection problem is to find a subset F' ⊆ F that maximizes the learners ability to classify patterns.

Formally F' should maximize some scoring function

𝓍1 → 𝓍i1

𝓍2 → 𝓍i2

. .

. .

. .

𝓍n → 𝓍in

**Feature Selection Steps**

Feature selection is an optimization problem

Step 1 : Search the space of possible feature subset.

Step 2 : Pick the subset that is optimal or near-optimal with respect to some objective function.

Search strategies

- Optimum

- Heuristic

- Randomized

Evaluation strategies

- Filter methods

- Wrapper methods

**Evaluating feature subset**

**Supervised (Wrapper method)**

- Train using selected subset

- Estimate error on validation dataset

**Unsupervised (Filter method)**

- Look at input only

- Select the subset that has the most information

**Forward Selection**

- Start with empty feature set

- Try each remaining feature

- Estimate classification/reg. error for adding each feature

- Select feature that given maximum improvement

- Stop when there is no significant improvement

**Backward Search**

- Start with full feature set

- Try remaining feature

- Drop the feature with smallest impact an error

Univariate (looks at each feature independently of others)

- Person correlation coefficient

- F-score

- Chi-square

- Signal to noise ration

- mutual information

- Etc.

Rank features by importance

Ranking cut-off is determined by user

**Person correlation coefficient**

- Measures the correlation between two variables

- Formula for person correlation =

- The correlation r is between +1 and -1.

- +1 means perfect positive correlation
- - 1 in the other direction

**Signal to noise ratio**

- Difference in means divided by difference in standard deviation between the two classes

S2N(X,Y) = (μx - μy) / (σx - σy)

- Large values indicate a strong correlation

**Multivariate feature selection**

- Multivariate (consider all features simultaneously)

- Consider the vector w for any linear classifier.

- Classification of a point x is given by wtx+w0.

- Small entries of w will have little effect on the dot product and therefore those features are less relevant

- For example if w = (10, 0.1, -9) then features 0 and 2 are contributing more to the dot product than feature 1.

- A ranking of features given by this w is 0,2,1.

- The w can be obtained by any of linear classifiers

- A variant of this approach is called

__recursive feature elimination__.

- Compute w on all features

- Remove feature with smallest wi

- Recompute w on reduced data

- If stopping criterion not met then go to step 2

ExcelR is a glad accomplice of University Malaysia Sarawak (UNIMAS), Malaysia's first state funded college and positioned eighth top college in Malaysia and positioned among top 200th in Asian University Rankings 2017 by QS World University Rankings. data science course in pune

ReplyDelete

ReplyDeleteVery intersting stuff thank u sharing ....

data analytics course

data science course

business analytics course

Very intersting stuff thank u sharing ....

ReplyDeletedata analytics course

data science course

business analytics course

I really enjoyed reading this article. Thanks for sharing valuable information.

ReplyDeleteData Science Course in Marathahalli

Data Science Course Training in Bangalore

wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.

ReplyDeleteData science Interview Questions

Attend The Business Analytics Courses From ExcelR. Practical Business Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Courses.

ReplyDeleteBusiness Analytics CoursesVery interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspried me to read more. keep it up.

ReplyDeleteCorrelation vs Covariance

I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!

ReplyDeleteCorrelation vs Covariance

You are in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!

ReplyDeleteBusiness Analytics Course in Hyderabad | Business Analytics Training in Hyderabad

I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.

ReplyDeleteData Science Training in Hyderabad | Data Science Course in Hyderabad

Cool stuff you have and you keep overhaul every one of us

ReplyDeleteSimple Linear Regression

After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

ReplyDeleteData Science Institute in Bangalore

I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.

ReplyDeleteSimple Linear Regression

Correlation vs Covariance

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

ReplyDeleteData Science Course in Pune

Data Science Training in Pune

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

ReplyDeleteData Science Course in Pune

Data Science Training in Pune

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

ReplyDeleteData Analytics Course in Pune

Data Analytics Training in Pune

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

ReplyDeleteData Analytics Course in Pune

Data Analytics Training in Pune

I see some amazingly important and kept up to length of your strength searching for in your on the site

ReplyDeleteData Science Training in Bangalore

Actually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written.

ReplyDeleteData Science Course

I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.

ReplyDeleteData Science Training

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

ReplyDeleteBusiness Analytics Course in Pune

Business Analytics Training in Pune

Nice blog. I finally found great post here Very interesting to read this article and very pleased to find this site. Great work!

ReplyDeleteData Science Training in Pune

Data Science Course in Pune

Nice Post. Very informative Message and found a great post. Thank you.

ReplyDeleteBusiness Analytics Course in Pune

Business Analytics Training in Pune

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.

ReplyDeleteCorrelation vs Covariance

Simple linear regression

data science interview questions