Can anyone tell me, how the Business íntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
Thanks!
For rules, the importance is calculated using the following formula:
Importance (A=>B) = log ( p(a|b) / p(a|not b) )
An importance of 0 means there is no association between A and B. A positive
importance score means that the probability of B goes up when A is true. A
negative importance score means that the probability of B goes down when A
is true.
Below is an example of the correlation counts of donut and muffin derived
from a purchase database. Each cell value represents the number of
transactions. For example, 15 out of 100 transactions include a customer
purchasing both donuts and muffins.
Donut Not Donut Total
Muffin 15 5 20
Not muffin 75 5 80
Total 90 10 100
The support, probability, and importance of related itemsets and rules for
donut and muffin:
Support({Donut}) = 90
Support({Muffin}) = 20
Support ({Donut, Muffin}) = 15
Probability({Donut}) = 90/100 = 0.9
Probability({Muffin}) = 20/100 = 0.2
Probability({Donut, Muffin}) = 15/100 = 0.15
Probability(Donut|Muffin) = 15/20 = 0.75
Probability(Muffin|Donut) = 15/90 = 0.167
Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833
Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)
/Probability(Donult|Not Muffin))= ln(0.8) = -0.223
Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)
/Probability(Muffin| Not Donut)) = ln(0.33) = -1.100
From the importance of the itemset {Donut, Muffin}, we can see Donut and
Muffin are negatively correlated; it is rather unlikely for someone who buys
a Muffin to also buy a Donut.
The Importance score is also known as Weight of Evidence (WOE).
|||Hi,thanks a lot for your answer!
I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.
Your formula for the importance is almost right, but it calculates the importance for
Muffin =>Donut and not Donut => Muffin
and it must be "log" and not "ln" !!
So at the end, this must be the right formula:
Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )
and for
Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )
UllaH
|||Acutually at the beginning of Jamie's answer, the formular has been already there correctly:
Importance (A=>B) = log ( p(a|b) / p(a|not b) )
Regards,
|||Importance (A=>B) = log ( p(a|b) / p(a|not b) )
It makes more sense to me if a and b are switched in the log function
Can some one point me to a Microsoft Research Paper "With all due respect to all" not just odiscussion Onions that discusses the theoretical background for calculating Rule importance?
Musa
|||Dear all,I try to run the "donuts and muffins" example by using SQL 2005 BI but I didnot have the results as the formula you instruct (Importance (A=>B) = log ( p(a|b) / p(a|not b) ) ). Please explain me more detail.
Thank you very much.
Your truthly,
No comments:
Post a Comment