Showing posts with label market-basket. Show all posts
Showing posts with label market-basket. Show all posts

Tuesday, March 20, 2012

Association Questions

I'm wondering if anyone can give me some help with an association model I'd like to setup. It's a typical market-basket analysis, but rather than grouping by individual customers, I'd like to group by customer grouping. (In our database, customers are grouped into categories like: large, small, medium) If this is possible, I'd like to generate the most popular items (so just querying the most probable itemsets), for each customer grouping (I'll refer to this as 'segments' from here on out), and then create a listing of customers in each segment which do not have the most popular items for their segment. I know for this last part I can use reporting services to tackle that problem, however, I'm not really sure how I can really do the rest of this with an association model in SSAS.

Our table structure looks like this:

Code Snippet

CustomerTable PurchasesTable

- --

CustomerName(key) CustomerName

CustomerGroup PurchasedProduct

And the data is arranged in this fashion:

Code Snippet

Customer Table:

CustomerName CustomerGroup

- -

A large

B large

C small

Purchases Table:

CustomerName PurchasedProduct

-

A ProductA

A ProductB

B ProductA

C ProductC

C ProductD

I know this is a lot of information but any help you guys may be able to offer would be great! Thanks!

Just a suggestion which might not do exactly what you requested, but might help:

You could create one association model looking like below:

(

Customer TEXT KEY,

CustomerGroup TEXT DISCRETE, // i.e. used as input, on the left hand side of the rules

Products TABLE PREDICT

(

Product TEXT KEY

)

)

Then, for each customer, get the recommended association predictions based on:

- the products currently owned by the customer

- the group that the customer belongs to

This will create rules that might include the customer group on the left hand side (if that it found to be interesting). It may also generate rules which ignore the group, containing items that sell together across groups of customers.

In the prediction query, if you ask solely for predictions that are NOT based on rules (but on popular itemsets), you will get a list of popular itemsets that should match each customer according to previous purchases and possibly group. Of course, excluding predictions that are based on rules might actually reduce the accuracy of the recommendations. You might want to sort the prediction results by support and use some sort of threshold

If you must really separate the groups (i.e. require to take the customer group into account for each individual customer), you could also build 3 models, one for each group. Then, you can use the same kind of predictions, but a different model for each customer (based on the customer group). This way, the recommendations will be based solely on popular items within the current customer group.

Hope this helps

|||

I was thinking that might be the way to go about doing that, but just have a couple quick questions to clarify.

If I do it the way you suggested, you said that I might get rules that ignore the group/contain the group. How may I filter(?) the results so I only obtain those with with the groups, as I'm not as interested in what individual customers purchased, so much as just purchases made within the groups?

If I built three models, how would I tell the model to only be concerned with the 'small' group when all the groups belong in the same database? Is there somewhere I can specify a filter on the mining model to only focus on certain fields...or something?

Thanks for your help so far!

|||

In SQL Server 2005 you would need to build three separate views (or named queries in the DSV) each showing only a single group of customers. Then for each group you build a separate mining structure and a separate model.

We are working on some advances that may make this easier in future versions, but for now, you should find that this approach works well.

Association Mining Model Predictions

Hi,

I've been playing around with the association mining model in SQL server 2005 and built a market-basket analysis of my data that I'm pretty happy with. The next task for me is figuring out how to run DMX queries against the data that I've just mined, so we may possibly use it in a web based application. This wouldn't necessarily be a difficult problem (and still may not be), but every example I've seen for the Mining Model Prediction Designer uses relational databases and I built my mining model off OLAP. Therefore, my predictable attribute is nested and when relating the mining model structure to the relational database that the cube was built off always gives me an error:

"Errors in the high-level relational engine. The 'CompanyName' column could not be found in the top-level clause of the SHAPE statement."

What I would like to do, and I'm not really even sure how I should structure any of my queries, is feed the model a product and have it return a listing of all the products it predicts. Currently, I've only been able to get the designer mode to process a singleton query, and even that didn't return any useful data. I know that this probably can be done pretty easily so any advice you may be able to offer would be greatly appreciated!!

So you may better understand my question, my association mining structure hierarchy looks as this..

[Model] ProductRecommend

[Case][Key]CustomerList

[Case][Attribute]CompanyName

[NestedTable]Product

[Nested][Key]PRODUCTCLASSID

[Nested][Attribute]PRODUCT

With that in mind, I'm trying to perform a query simliar to this:

SELECT

PredictProbability([ProductRecommend].[Product].[PRODUCTCLASSID]), <- Throws Error for PredictProbability syntax no matter what I try to get to [PRODUCTCLASSID]

(SELECT [PRODUCT] FROM [ProductRecommend].[Product])

From

[ProductRecommend]

NATURAL PREDICTION JOIN

(SELECT 'test' AS [COMPANYNAME],

(SELECT '1234' AS [PRODUCTCLASSID],

'ProductA' AS [PRODUCT]) AS [Product]) AS t

Thanks again for any help!

Just incase someone out there runs into the same trouble I had, I think I've made some progress. My mistake was that I was using the PredictProbability function when I should have been using the PredictAssociation function. PredictAssociation does exactly what I wanted to do with the query, now I just need to figure out how to structure the query so it's simply based off an input product.|||What do you mean "simply based off an input product"?|||

Well it looks like 'simply based off an input product' isn't so simple... Since I've finally gotten queries to run, they've all pretty much returned the same associations. I feel like I've been everywhere on the internet looking for how to do what I described above, but just can't find enough information to solve my problem. My overall goal is to create a recommendation system similiar to Amazon.com. I'm pretty sure my mining model achieves this goal, but writing the DMX do so is a little bit difficult based on the structure of our database, and the fact that I modeled the Mining Structure off our OLAP cube.

Basically I just want to say:

SELECT

Predict Association([Product],10)

FROM

[ProductRecommendation]

NATURAL PREDICTION JOIN

(SELECT (SELECT 'ProductCustomerHasSelected' AS [ProductName], 'ProductKey' AS [PRODUCTCLASSID]) AS [Product])

And have it return:

ProductCustomerHasSelected -- Predicts Product A, B, D, and E

Is this the wrong way to go about doing that?

|||

BTW, have you looked at Raman Iyer and Jesper Lind's article? Create a Web Cross-sell Application

It has code examples, including how to construct the queries. You can see the app they build in action on a little webcast I did at.

In this article they use the following query:

SELECT FLATTENED

TopCount(Predict([Customer Movies], INCLUDE_STATISTICS), $AdjustedProbability, 5)

FROM [Movie Recommendations]

NATURAL PREDICTION JOIN

( SELECT ( SELECT 'Star Wars' AS [Movie] UNION SELECT 'The Matrix' AS [Movie] ) AS [Customer Movies] ) AS t

This returns the top 5 movies associated with the input movies: Star Wars and the Matrix in this case.

There is an excellent explanation of the thinking behind this query in the article.

hth