SCD Type 4
Dimensional data that change slowly or unpredictably are captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.
All the update methods for the different SCD types are accomplished using the SortCL program in IRI CoSort. In the full IRI Voracity ETL and data management platform, there is a graphical wizard in its IRI Workbench front-end to create the SCD job scripts for SortCL to run.
Most SCD types use a full outer join to match records from the original data source with records in the updated source based on equating a key from each. Type 4, however, accomplishes the update using a sort.
Type 4 SCD
The Type 4 model is similar to that for Type 2. The difference is that there are 2 tables or files that are maintained: one for the current costs and one to hold the history records for the costs. When new current records are added to the master, these new records are usually added to the history at the same time.
The field definitions are the same in the update, the history and the master files or tables. We will sort the records for the update and history together. A new master will be created that only has one record for each ProductCode and that record will be the most current. The history will have all the records for each ProductCode including the ones from the update source.
The starting master table will be the same as the starting master from our example of Type 1 with values as shown below.
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2365.00 | 20120101 |
G101 | 19.25 | 20110930 |
J245 | 450.50 | 20110430 |
S022 | 98.75 | 20110515 |
All the update data will all have the same StartDate. The update.dat source contains records with the following values:
ProductCode | Cost | StartDate |
---|---|---|
F112 | 2425.00 | 20120701 |
J245 | 550.50 | 20120701 |
M447 | 101.75 | 20120701 |
S022 | 101.75 | 20120701 |
The starting history source below is like the starting master in the Type 2. So Type 4 is a combination of Type 1 and Type 2 where the master is the same as a Type 1 master and the history is the same as a Type 2 master.
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2365.00 | 20120101 |
G101 | 19.25 | 20110930 |
G101 | 21.25 | 20110501 |
J245 | 450.50 | 20110430 |
J245 | 425.25 | 20101001 |
J245 | 385.25 | 20100215 |
S022 | 98.75 | 20110515 |
In IRI Workbench, there is a Voracity wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the sources that are used for processing the update is displayed.
The next screen is where you set up the sort by selecting the keys. There are 2 keys: ProductCode then StartDate. We are grouping by ProductCode, but we also need to sort in descending order by StartDate within each ProductCode grouping. This is so that we can filter the most recent record for each ProductCode group to the new master.
To make the StartDate a key that sorts in descending order, select that key and then select Edit Key. This will bring up the following screen:
In the dropdown for Direction select Descending.
Here is the job script for the sort:
/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/SCD4/history4.dat /PROCESS=DELIMITED /ALIAS=history4 /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/update.dat /PROCESS=DELIMITED /ALIAS=update /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /SORT /KEY=(PRODUCTCODE, TYPE=ASCII) /KEY=(STARTDATE, DESCENDING, TYPE=ASCII) /OUTFILE=history4.dat # This file will contain all the records from both inputs /PROCESS=DELIMITED /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /OUTFILE=master4.dat # Include only one record for each ProductCode /PROCESS=DELIMITED /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"') /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"') /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') /INCLUDE WHERE PRODUCTCODE
Below is the updated master produced by running the above job script using SortCL. The values are the same that are produced in a Type 1 update.
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2425.00 | 20120701 |
G101 | 19.25 | 20110930 |
J245 | 550.50 | 20120701 |
M447 | 139.25 | 20120701 |
S022 | 101.75 | 20120701 |
The history file contains all previous records plus the new records from the update file.
ProductCode | Cost | StartDate |
---|---|---|
C123 | 125.50 | 20110228 |
F112 | 2425.00 | 20120701 |
F112 | 2365.00 | 20120101 |
G101 | 19.25 | 20110930 |
G101 | 21.25 | 20110501 |
J245 | 550.50 | 20120701 |
J245 | 450.50 | 20110430 |
J245 | 425.25 | 20101001 |
J245 | 385.25 | 20100215 |
M447 | 101.75 | 220120701 |
S022 | 101.75 | 20120701 |
S022 | 98.75 | 20110515 |