The Science of today is the technology of tomorrow . Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Latest Post

Search This Ntirawen

Association Rule Mining in Data Analytics

Mining frequent patterns and rules

Association rules: conditional dependencies 

Two stages
  - Find frequent patterns
  - Derive associations (A → B) from frequent patterns

Find patterns in
  - Sequences (time series data, fault analysis)
  - Transactions (market basket data)
  - Graphs (social network analysis)

Mining Transactions
  • Transaction is a collection of items bought together 
            - A (sub) of items is called an itemset
  • Find frequent itemsets
  • Itemset A → B, if both  A and A ሀ B are frequent itemsets.


Frequent Pattern Mining



  • Support of rule is the percentage of itemsets containing A ሀ B
  • Confidence of a rule is the percentage of itemsets containing A that also contain A ሀ B
  • We look for rules with both high support and confidence
          - Can be determined from the frequent itemsets; hence more effort focused on that

Association Rules



Applications
  • Market Basket analysis
  • Topic identification
         - co-occurrence of words
  • Plagiarism Detection !
  • Biomarkers
          - Genes or proteins vs. disease
  • Time series analysis !
         - Trigger Events

Finding Frequent Itemsets
  • Generate candidate frequent itemsets and then prune based on count
  • Combinatorial number of candidates !
  • Need a clever way of generating fewer candidates.
  • Apriori Property !

Apriori Algorithm
  • Apriori property : All nonempty subsets of a frequent itemset must also be frequent.
           - Used to prune search space
  • R. Agrawal and R. Srikant Fast algorithms for mining association rules. VLDB 1994. pp.  487-499
  • Seminal algorithm
        - Finding frequent itemsets in a database of transactions
        - Set the tone for the field
        - Numerous improvements

Apriori illustrated


Caveat
  • High confidence is not always a good idea
         - Buys games => Buys videos confidence 66% support 37%
         - But, Buys videos 75% of the transactions !
         - Negative correlation
  • Lift
         - Ratio of Confidence of rule to that of default rule
         - Interest : difference

Challenges
  • Millions of transaction
        - Billions of potential itemsets
  • Non discrete data
         - Time series and images
         - Graphs
  • Non frequent, but significant
          - A occurs 0.3% of all transactions, but when B occurs it occurs in 1.2% of the transactions

No comments:

Post a Comment

Tags

Popular Posts