**Mining frequent patterns and rules**

**Association rules: conditional dependencies**

**Two stages**

- Find frequent patterns

- Derive associations (A → B) from frequent patterns

**Find patterns in**

- Sequences (time series data, fault analysis)

- Transactions (market basket data)

- Graphs (social network analysis)

**Mining Transactions**

- Transaction is a collection of items bought together

- Find frequent itemsets
- Itemset A → B, if both A and A ሀ B are frequent itemsets.

**Frequent Pattern Mining**

- Support of rule is the percentage of itemsets containing A ሀ B
- Confidence of a rule is the percentage of itemsets containing A that also contain A ሀ B
- We look for rules with both high support and confidence

**Association Rules**

**Applications**

- Market Basket analysis
- Topic identification

- co-occurrence of words

- Plagiarism Detection !
- Biomarkers

- Genes or proteins vs. disease

- Time series analysis !

- Trigger Events

**Finding Frequent Itemsets**

- Generate candidate frequent itemsets and then prune based on count
- Combinatorial number of candidates !
- Need a clever way of generating fewer candidates.
- Apriori Property !

**Apriori Algorithm**

- Apriori property : All nonempty subsets of a frequent itemset must also be frequent.

- R. Agrawal and R. Srikant Fast algorithms for mining association rules. VLDB 1994. pp. 487-499
- Seminal algorithm

- Set the tone for the field

- Numerous improvements

**Apriori illustrated**

**Caveat**

**High confidence is not always a good idea**

- Buys games => Buys videos confidence 66% support 37%

- But, Buys videos 75% of the transactions !

- Negative correlation

**Lift**

- Ratio of Confidence of rule to that of default rule

- Interest : difference

**Challenges**

**Millions of transaction**

- Billions of potential itemsets

**Non discrete data**

- Time series and images

- Graphs

**Non frequent, but significant**

- A occurs 0.3% of all transactions, but when B occurs it occurs in 1.2% of the transactions

## No comments:

## Post a Comment