Thursday, March 22, 2012

Asynchronous Script Component

Hi--done some searching, but I am not finding exactly what I need. I am using an asynchronous script component as a lookup since my table I am looking up on requires an ODBC connection. Here is what my data looks like:

From an Excel connection:

Order Number

123

234

345

The table I want to do a lookup on has multiple rows for each order number, as well as a lot of rows that aren't in my first table:

Order Number Description

123 Upgrade to System

123 Freight

123 Spare Parts

234 Upgrade to System

234 Freight

234 Spare Parts

778 Another thing

889 Yet more stuff

etc. My desired result would be to pull all the items from table two that match on Order Number from table one. My actual results from the script I have is a single (random) row from table two for each item in table one.....So my current results look like:

Order Number Description

123 Freight

234 Freight

345 Null

And I want:

Order Number Description

123 Upgrade to System

123 Freight

123 Spare Parts

234 Upgrade to System

234 Freight

234 Spare Parts

345 Null

etc.... Here is my code, courtesy of half a dozen samples found here and elsewhere...

Code Snippet

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Data.Odbc


Public Class ScriptMain
Inherits UserComponent

Dim connMgr As IDTSConnectionManager90
Dim odbcConn As OdbcConnection
Dim odbcCmd As OdbcCommand
Dim odbcParam As OdbcParameter


Public Overrides Sub AcquireConnections(ByVal Transaction As Object)

connMgr = Me.Connections.JDEConnection
odbcConn = CType(connMgr.AcquireConnection(Nothing), OdbcConnection)

End Sub

Public Overrides Sub PreExecute()

odbcCmd = New OdbcCommand("SELECT F4211.SDDSC1, F4211.SDDOCO FROM DB.F4211 F4211 Where F4201.SHDOCO = ?", odbcConn)

odbcParam = New OdbcParameter("1", OdbcType.Int)
odbcCmd.Parameters.Add(odbcParam)


End Sub


Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Dim reader As Odbc.OdbcDataReader
odbcCmd.Parameters("1").Value = Row.SO
odbcCmd.ExecuteNonQuery()
reader = odbcCmd.ExecuteReader()
If reader.Read() Then

With Output0Buffer
.AddRow()
.SDDSC1 = reader("SDDSC1").ToString
.SONumb = Row.SO
.SOJDE = CDec(reader("SDDOCO"))
End With


End If

reader.Close()

End Sub

Public Overrides Sub ReleaseConnections()
connMgr.ReleaseConnection(odbcConn)
End Sub


End Class

I just don't know what I need to do to get every row from F4211 where SDDOCO matches Row.SO instead of a single row...... Any ideas or help? Oh, the reason I am starting with my Excel connection is that sheet lists the Orders I need detailed data for, and is only a few hundred rows....F4211 is really really really big.

I have also worked out an alternate way to do this using merge join tasks...but then my datareader source goes off and fetches 300,000 rows from F4211 before my final result set of about 1200 rows. That just feels like a bad approach to me...or am I being over-cautious? I'm a newb (if you couldn't already tell)...so guidence is appreciated.

Thank you....

In a first data flow, you could load a staging table in SQL server with the contents of your ODBC source table. Then in a second data flow, you can use that staging table as the source for your lookup component. Might be a bit less work for you.|||So, to make sure I understand--add another data flow. Have it write the records from my F4211 table to a SQL table, then, in my original data flow, do a lookup on my newly created table in SQL...then, I suppose, add an Execute SQL task to blow all those records away? And, I suppose, just to be tidy about it....I could add a shrink database task to clean up afterwards.....?|||

In general and when possible it is a good idea to use staging tables to put all data pieces on the SQL Server side. Besides simplify the dataflow; it improves performance.

I think you are undeestanding Phil's sugestion pretty well; but I am not shure is I would bother with the shrink database step; if you are going to execute this process in a regular basis then you would need that space anyway.

|||Yep, you got it. Now have fun!|||

hilaryjade wrote:

I just don't know what I need to do to get every row from F4211 where SDDOCO matches Row.SO instead of a single row...... Any ideas or help? Oh, the reason I am starting with my Excel connection is that sheet lists the Orders I need detailed data for, and is only a few hundred rows....F4211 is really really really big.

So, to go back to the original question.... You want all the rows from the recordset, right? Don't you just need to change that If to a While loop?

Code Snippet


While reader.Read()

With Output0Buffer
.AddRow()
.SDDSC1 = reader("SDDSC1").ToString
.SONumb = Row.SO
.SOJDE = CDec(reader("SDDOCO"))
End With


End While



|||Told ya I was a newb....Thanks so much!!! While a staging table and lookup off it is an interesting idea (and I appreciate it...) the script runs so much faster.|||

hilaryjade wrote:

Told ya I was a newb....Thanks so much!!! While a staging table and lookup off it is an interesting idea (and I appreciate it...) the script runs so much faster.

Did you test it? Can you share the timing results and row counts?

|||I haven't finished adding my full data set I need to pull to my script component yet, but after I do, I can run and time them. I did try adding and using a staging table last night--but ran into a bit of a data type mismatch roadblock on the lookup--I tried a handful of conversions to see if I could get my DT_R8 from Excel to play nicely with my Numeric from the staging table in SQL, but got a bit frustrated and went off to work on my third alternative...using merge joins (which works nicely and runs in 2.2 minutes, but 2 minutes starts feeling a bit long, you know?). At any rate, just creating the staging table took longer than the script takes (but, again, that was without my full data set)....After I get the script component complete and pulling all my data, I'll run, time, and post results. Again, thanks for all the help!|||

Well, I'm not really comparing apples to apples with this, since my script component is part of a data flow that starts with a connection to an excel file, does a lookup on a table with an odbc connection via the script component and then writes to a recordset and the data flow for the staging table concept uses a data reader to collect my records from a table with an odbc connection and then writes them to a SQL db table...

The dataflow with the script component ran in 00:06.844 and wrote 953 rows to my recordset (just writing the rows I needed, selected via script component)

The dataflow to create a staging table ran in 01:16.313 and wrote 219,155 rows to a table in a SQL db, where I could then do a lookup to grab the records I need (953 rows)

I think for this instance, where I need so few records from such a large table, it makes sense to use the asynchronous script component rather than create a staging table.

Again, thanks to all for the help and suggestions. I really appreciate it.

|||Did you use the Fast Load option in the OLE DB Destination when using the staging table approach?|||Yes, on the Connection Manager page for the destination editor, Data access mode is set to Table or View - fast load.sql

No comments:

Post a Comment