Monday, March 19, 2012

Assistance on implementing Data Mining

Hi,

I'm new to SQL Server and data mining, so please forgive my ignorance...

I'm working on a project which requires me to use the datamining provided by SQL Server 2005. I've a table for which i want to predict the values in a table (Encyclopedia)

The table contains the following fields:

Component

Major Attribute

Minor Attributes(which is basically a list of CSV for attributes in no particular order)

I want to predict the component if i enter the attributes ..... my questions:

1. Should i change the table structure in any way to assist in data mining?

2. What model would be preferrable?

3. If i'm using the model will it extend to the data added to the table automatically or do i have to update it regularily.

I need to submit the project by 20th... and i'm not even started. I tried a lot on my own..... but couldn't get anywhere without definitive assistance from anyone.

Please help

Thanks and Regards,

Sundeep Singh

You should probably use a nested table which contains multiple rows for each of the Minor Attributes associated with a Component - so your mining model might look something like this:

CREATE MINING MODEL CompPredict(
CaseID LONG KEY,
Component TEXT DISCRETE PREDICT,
MajorAttribute TEXT DISCRETE,
MinorAttributes TABLE(
MinorAttribute TEXT KEY
)
)
USING Microsoft_Decision_Trees

If you add data to the source database that you process your model from, you will need to reprocess the model either manually or using scheduled job or Integration Service package.

|||

Hi Raman

Thankx for ur answer..................

I converted the tables as you said

TableComponent(ComponentID, BodyPart,MajorAttribute,ComponentName)

TableMinor(ComponentID,MinorAttribute)

Used TableComponent as Case(ComponentID as Key), and TableMinor as Nested(MinorAttribute as key)

I tried predicting Component Name using BodyPart, MajorAttribute and Minor Attribute as Input.....

I tried it using Microsoft Decision trees and it generated only single node for it.........

What do you think can be the problem....................

Moreover i need to generate the association rules from the data but again it is generating no rules from the given data...........

Thanks

Sundeep Singh

|||How big is your data set? Have you tried tweaking the algorithm parameters (lower COMPLEXITY_PENALTY, MINIMUM_LEAF_CASES, MINIMUM_SUPPORT)?|||

Hi,

I tried to reduce the parameters and got some rules... but there is a problem. the association rules that are being generated use a single minorAtt at a time.

I mean that in the entire collection there is not one that uses two MinorAttributes to Predict the column.

And can you please help me with the query... assuming that i have a major symptom and some minor symptoms and i want to predict what Components they can be for along with the probability of correctness..

Thanks

Sundeep Singh

|||

Hi,

played with the algo params and it worked.. i don't know how as yet.. but i got some rules with more than one minor symptom.

But the problem with the query remains... somebody please help!!!!!!!! only four days left for submission....

Thanks

Sundeep Singh

|||

You can do a SELECT COUNT(*) FROM TableMinor and SELECT COUNT(DISTINCT ComponentID) FROM TableMinor. If these numbers are the same, than you have one minor/component, otherwise you do have multiple minors/component and everything's OK (except maybe your data).

If you don't have multiple minors/component, you don't need a nested table. From the initial description it seemed like there were.

|||

The number of minor are more than one, but the query that i asked for was to predict the component when i enter the minor and major attributes along with the probability of match...

Supposing that i have named my model as Encyclopedia.

Thanks

|||Try this to get the top 3 predictions for component:

SELECT FLATTENED
TopCount(PredictHistogram(Component), $AdjustedProbability, 3)
FROM [ComponentPredictModel]
NATURAL PREDICTION JOIN
(SELECT 'xxx' as MajorAttribute,
(SELECT ( SELECT 'x' AS Minor UNION
SELECT 'y' AS Minor )
AS [MinorAttributes])
) AS t

No comments:

Post a Comment