Slowly Changing Dimensions: Slowly changing
dimensions are the dimensions in which the data changes slowly,.
rather than changing regularly on a time
basis.
Let’s say I have a customer dimension with
these columns mainly
(Customer Id, Customer First Name,
Customer Last Name, Customer Country)
Customer
Id
|
Customer First Name
|
Customer Last Name
|
Customer Country
|
1
|
Sudheer
|
Sharma
|
India
|
Now, this guy moved to US. In source the
country name has been changed to US, we need to update that in our target
dimension to reflect this change.
SCD Type 1: The new incoming record (changed/modified data
set) replaces the existing old record in target.
Customer
Id
|
Customer First Name
|
Customer Last Name
|
Customer Country
|
1
|
Sudheer
|
Sharma
|
US
|
Old value (India) is overwritten by the new
value (US) and there is no way to find out the old version of data. It holds
only the current version of data.
SCD Type 2: In this case, an additional record is added into the
customer dimension. The beauty of this approach is it will maintain two
versions, you will find two records the older version and the current version.
In other words it maintains history. Again we can implement Type 2 in following
methods
1.
Versioning
2.
Effective Dates
3.
By setting Current
Flag values/Record Indicators.
Method 1: Versioning
Customer Id
|
Customer First Name
|
Customer
Last Name
|
Customer Country
|
Effective Year
|
Version
|
1
|
Sudheer
|
Sharma
|
India
|
2008
|
0
|
1
|
Sudheer
|
Sharma
|
US
|
2009
|
1
|
Method 2: Effective Dates
Customer Id
|
Customer
First Name
|
Customer Last Name
|
Customer Country
|
Effective
Start Date
|
Effective EndDate
|
1
|
Sudheer
|
Sharma
|
India
|
01/01/2008
|
12/31/2008
|
1
|
Sudheer
|
Sharma
|
US
|
01/01/2009
|
tilldate
|
Method 3: Effective Dates & Current Record
Indicators
Customer Id
|
Customer
First Name
|
Customer
Last Name
|
Customer Country
|
Effective
Start Date
|
Effective
End Date
|
Current Record IND
|
1
|
Sudheer
|
Sharma
|
India
|
01/01/2008
|
12/31/2008
|
N
|
1
|
Sudheer
|
Sharma
|
US
|
01/01/2009
|
tilldate
|
Y
|
SCD Type 3: In this approach, only the information about a
previous value of a dimension is written into the database. An ‘old ‘or
‘previous’ column is created which stores the immediate previous attribute.
Product
ID
|
Product
Name
|
Current
Year
|
Current
Price
|
Previous
Year
|
Previous
Price
|
1
|
Close-up
|
2008
|
50.00
|
2007
|
45.00
|
The problem with this approach is over years,
if the product price continuously changes, then the complete history may not be
stored, only the latest change will be stored. For example, in year 2009, if
the product price changes to 60, then we would not be able to see the complete
history of 2007 prices, since the old values would have been updated with 2008
information.
No comments:
Post a Comment