Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Recent Post

Search

Friday, 19 October 2018

Association Rule Mining in Data Analytics

Mining frequent patterns and rules

Association rules: conditional dependencies 

Two stages
  - Find frequent patterns
  - Derive associations (A → B) from frequent patterns

Find patterns in
  - Sequences (time series data, fault analysis)
  - Transactions (market basket data)
  - Graphs (social network analysis)

Mining Transactions
  • Transaction is a collection of items bought together 
            - A (sub) of items is called an itemset
  • Find frequent itemsets
  • Itemset A → B, if both  A and A ሀ B are frequent itemsets.


Frequent Pattern Mining



  • Support of rule is the percentage of itemsets containing A ሀ B
  • Confidence of a rule is the percentage of itemsets containing A that also contain A ሀ B
  • We look for rules with both high support and confidence
          - Can be determined from the frequent itemsets; hence more effort focused on that

Association Rules



Applications
  • Market Basket analysis
  • Topic identification
         - co-occurrence of words
  • Plagiarism Detection !
  • Biomarkers
          - Genes or proteins vs. disease
  • Time series analysis !
         - Trigger Events

Finding Frequent Itemsets
  • Generate candidate frequent itemsets and then prune based on count
  • Combinatorial number of candidates !
  • Need a clever way of generating fewer candidates.
  • Apriori Property !

Apriori Algorithm
  • Apriori property : All nonempty subsets of a frequent itemset must also be frequent.
           - Used to prune search space
  • R. Agrawal and R. Srikant Fast algorithms for mining association rules. VLDB 1994. pp.  487-499
  • Seminal algorithm
        - Finding frequent itemsets in a database of transactions
        - Set the tone for the field
        - Numerous improvements

Apriori illustrated


Caveat
  • High confidence is not always a good idea
         - Buys games => Buys videos confidence 66% support 37%
         - But, Buys videos 75% of the transactions !
         - Negative correlation
  • Lift
         - Ratio of Confidence of rule to that of default rule
         - Interest : difference

Challenges
  • Millions of transaction
        - Billions of potential itemsets
  • Non discrete data
         - Time series and images
         - Graphs
  • Non frequent, but significant
          - A occurs 0.3% of all transactions, but when B occurs it occurs in 1.2% of the transactions

No comments:

Post a Comment