Tuesday, March 20, 2012

Association algorithm - Importance of a rule

Can anyone tell me, how the Business íntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.

Thanks!

For rules, the importance is calculated using the following formula:

Importance (A=>B) = log ( p(a|b) / p(a|not b) )

An importance of 0 means there is no association between A and B. A positive

importance score means that the probability of B goes up when A is true. A

negative importance score means that the probability of B goes down when A

is true.

Below is an example of the correlation counts of donut and muffin derived

from a purchase database. Each cell value represents the number of

transactions. For example, 15 out of 100 transactions include a customer

purchasing both donuts and muffins.

Donut Not Donut Total

Muffin 15 5 20

Not muffin 75 5 80

Total 90 10 100

The support, probability, and importance of related itemsets and rules for

donut and muffin:

Support({Donut}) = 90

Support({Muffin}) = 20

Support ({Donut, Muffin}) = 15

Probability({Donut}) = 90/100 = 0.9

Probability({Muffin}) = 20/100 = 0.2

Probability({Donut, Muffin}) = 15/100 = 0.15

Probability(Donut|Muffin) = 15/20 = 0.75

Probability(Muffin|Donut) = 15/90 = 0.167

Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833

Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)

/Probability(Donult|Not Muffin))= ln(0.8) = -0.223

Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)

/Probability(Muffin| Not Donut)) = ln(0.33) = -1.100

From the importance of the itemset {Donut, Muffin}, we can see Donut and

Muffin are negatively correlated; it is rather unlikely for someone who buys

a Muffin to also buy a Donut.

The Importance score is also known as Weight of Evidence (WOE).

|||

Hi,thanks a lot for your answer!

I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.

Your formula for the importance is almost right, but it calculates the importance for

Muffin =>Donut and not Donut => Muffin

and it must be "log" and not "ln" !!

So at the end, this must be the right formula:

Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )

and for

Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )

UllaH

|||

Acutually at the beginning of Jamie's answer, the formular has been already there correctly:

Importance (A=>B) = log ( p(a|b) / p(a|not b) )

Regards,

|||

Importance (A=>B) = log ( p(a|b) / p(a|not b) )

It makes more sense to me if a and b are switched in the log function

Can some one point me to a Microsoft Research Paper "With all due respect to all" not just odiscussion Onions that discusses the theoretical background for calculating Rule importance?

Musa

|||Dear all,

I try to run the "donuts and muffins" example by using SQL 2005 BI but I didnot have the results as the formula you instruct (Importance (A=>B) = log ( p(a|b) / p(a|not b) ) ). Please explain me more detail.

probability importance
0.938 0.105302438 F3 = NotMuffin -> F2 = Donut








0.833 0.218055761 F2 = Donut -> F3 = NotMuffin


0.75 -0.105302438 F3 = Muffin -> F2 = Donut


0.5 -0.218055761 F2 = NotDonut -> F3 = NotMuffin


0.5 0.458637849 F2 = NotDonut -> F3 = Muffin

Thank you very much.
Your truthly,
sql

No comments:

Post a Comment