SCD Type 1
Dimensional data that change slowly or unpredictably is captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.
All the update methods for the different SCD types can be accomplished using the SortCL program in IRI CoSort. In the IRI Voracity data integration platform (ETL tool) powered by CoSort, there is a fit-for-purpose wizard (see below) to aid in creating the SCD job scripts the SortCL program runs.
Most SCD types use a full outer join to match records from the original data source with records in the update source based on equating a key from each. Records with matches need to be updated. Records in the update source that do not have a match need to be added to the master.
This article covers the SCD Type 1 model, where new information from the update data overwrites original information in the master source. Records in the update source without matches are added to the new master. As this model involves overwriting old values with the current values, and maintains no history, it is not often used.
The update is accomplished by joining with respect to the field ProductCode. For this example, the current data will be the source file master1.dat and update.dat is used to change values or add records in the current master source.
The master1.dat source contains:
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2365.00 | 20120101 |
G101 | 19.25 | 20110930 |
J245 | 450.50 | 20110430 |
S022 | 98.75 | 20110515 |
In this case, the update data will all have the same StartDate. The update.dat source contains records with the following values:
ProductCode | Cost | StartDate |
---|---|---|
F112 | 2425.00 | 20120701 |
J245 | 550.50 | 20120701 |
M447 | 101.75 | 20120701 |
S022 | 101.75 | 20120701 |
The new dimensional table will have these values after the update:
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2425.00 | 20120701 |
G101 | 19.25 | 20110930 |
J245 | 550.50 | 20120701 |
M447 | 139.25 | 20120701 |
S022 | 101.75 | 20120701 |
In the IRI Workbench GUI menu for Voracity ETL jobs, there is a wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the files that are used for processing the update is displayed.
The next screen is for defining the join performed with the master and update sources
Below is the sortcl job script that is produced when processing the two files master1.dat and update.dat:
/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/SCD1/master1.dat /PROCESS=DELIMITED /ALIAS=master1 /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/update.dat /PROCESS=DELIMITED /ALIAS=update /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /JOIN FULL_OUTER NOT_SORTED master1 NOT_SORTED update WHERE MASTER1.PRODUCTCODE == UPDATE.PRODUCTCODE /OUTFILE=/master1.dat # This processes all except the new records /PROCESS=DELIMITED /FIELD=(MASTER1.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST_NEW, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"', IF MASTER1.PRODUCTCODE EQ UPDATE.PRODUCTCODE THEN UPDATE.COST ELSE MASTER1.COST) /FIELD=(STARTDATE_NEW, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"', IF MASTER1.PRODUCTCODE EQ UPDATE.PRODUCTCODE THEN UPDATE.STARTDATE ELSE MASTER1.STARTDATE) /INCLUDE WHERE MASTER1.PRODUCTCODE NE "" /OUTFILE=master1.dat # This processes the new records to be added /PROCESS=DELIMITED /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') I /INCLUDE WHERE MASTER1.PRODUCTCODE EQ ""