Pages

Friday 8 August 2014

Slowly Changing Dimensions(SCD) full Description

Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes slowly,.
rather than changing regularly on a time basis.
Let’s say I have a customer dimension with these columns mainly
 (Customer Id, Customer First Name, Customer Last Name, Customer Country)
Customer
Id
Customer First Name
Customer Last Name
Customer Country
1
Sudheer
Sharma
India

Now, this guy moved to US. In source the country name has been changed to US, we need to update that in our target dimension to reflect this change.
SCD Type 1: The new incoming record (changed/modified data set) replaces the existing old record in target.
Customer
Id
Customer First Name
Customer Last Name
Customer Country
1
Sudheer
Sharma
US

Old value (India) is overwritten by the new value (US) and there is no way to find out the old version of data. It holds only the current version of data.




SCD Type 2: In this case, an additional record is added into the customer dimension. The beauty of this approach is it will maintain two versions, you will find two records the older version and the current version. In other words it maintains history. Again we can implement Type 2 in following methods
1.           Versioning
2.           Effective Dates
3.           By setting Current Flag values/Record Indicators.
Method 1: Versioning
Customer Id
Customer First Name
Customer
Last Name
Customer Country
Effective Year
Version
1
Sudheer
Sharma
India
2008
0
1
Sudheer
Sharma
US
2009
1

Method 2: Effective Dates
Customer Id
Customer
First Name
Customer Last Name
Customer Country
Effective
Start Date
Effective EndDate
1
Sudheer
Sharma
India
01/01/2008
12/31/2008
1
Sudheer
Sharma
US
01/01/2009
tilldate

Method 3: Effective Dates & Current Record Indicators
Customer Id
Customer
First Name
Customer
Last Name
Customer Country
Effective
Start Date
Effective
End Date
Current Record IND
1
Sudheer
Sharma
India
01/01/2008
12/31/2008
N
1
Sudheer
Sharma
US
01/01/2009
tilldate
Y

SCD Type 3: In this approach, only the information about a previous value of a dimension is written into the database. An ‘old ‘or ‘previous’ column is created which stores the immediate previous attribute.
Product
ID
Product
Name
Current
Year
Current
Price
Previous
Year
Previous
Price
1
Close-up
2008
50.00
2007
45.00


The problem with this approach is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2009, if the product price changes to 60, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 information.

Top 30 Informatica Interview Questions and Answers ?

1.What is Data Warehousing?
Ans:
Data warehouses Basis:- Data warehouses are widely used within the largest and most complex businesses in the world.
Use with in moderately large organizations, even those with more than 1,000 employees remains surprisingly low at the moment.
We are confident that use of this technology will grow dramatically in the next few years. In challenging times good decision-making becomes critical.
The best decisions are made when all the relevant data available is taken into consideration.
The best possible source for that data is a welldesigned data warehouse.
To make any new decision or to introduce new Plan data warehousing is very important.
*ETL is one of the main processes in data warehousing.ETL means extract transform and Load data into data warehouse.
Informatica is ETL tool. It is very flexible and cheaper as compared to other ETL tool.

2.What is Business Intelligence (BI)?
Ans:
BI is an abbreviation of the two words Business Intelligence, bringing the right information at the right time to the right people in the right format.
It is a 5-step process to run your business smarter, starting with registering the right data correctly, collecting the data from multiple sources,
transforming, combining and storing it in a data warehouse.

3.What is a Dimension Table?
Ans:
A dimension table stores attributes, or dimensions, that describe the objects in a fact table.
A dimension table has a primary key column that uniquely identifies each dimension record (row).
Types of Dimensions:
 * Slowly Changing Dimensions
 *Rapidly Changing Dimensions
 *Junk Dimensions
 *Inferred Dimensions
 *Conformed Dimensions
  Etc..











4.What is a Fact Table?
Ans:

A fact table stores quantitative information for analysis and is often denormalized.

 A fact table works with dimension tables. A fact table holds the data to be analyzed,
and a dimension table stores data about the ways in which the data in the fact table can be analyzed. Thus,
the fact table consists of two types of columns. The foreign keys column allows joins with dimension tables,
and the measures columns contain the data that is being analyzed.

5.What are the Fundamental Stages of Data Warehousing?
Ans:
i.Offline Operational Databases – Data warehouses in this initial stage are developed by simply copying the database of an operational system to
an off-line server where the processing load of reporting does not impact on the operational system’s performance.
ii.Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly)
from the operational systems and the data is stored in an integrated reporting-oriented data structure.
iii.Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event basis,
every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
iv.Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or transactions that
are passed back into the operational systems for use in the daily activity of the organization.

8.What are the Different Methods of Loading Dimension tables?
Ans:
i. Conventional Load: Before loading the data, all the Table constraints will be checked against the data.
ii.Direct load:(Faster Loading): All the Constraints will be disabled. Data will be loaded directly.
Later the data will be checked against the table constraints and the bad data won't be indexed.

9.What is Data Mining?
Ans:
Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis.
Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events.
Data mining is also known as Knowledge Discovery in Data (KDD).

The key properties of data mining are:

Automatic discovery of patterns

Prediction of likely outcomes

Creation of actionable information

Focus on large data sets and databases

Data mining can answer questions that cannot be addressed through simple query and reporting techniques.

10.What is the Difference between a View and a Materialized View?
Ans:
A view
is nothing but a SQL query takes the output of a query and makes it appear like
a virtual table which does not take up any storage space or contain any data



But Materialized views are schema objects it storing the results of a query in a
separate schema object (i.e. take up storage space and contain data). This
indicates the materialized view is returning a physically separate copy of the
table data.

11.What is a staging area?
Ans:
Staging(logical Memory) area is place where you hold temporary tables on data warehouse server.
Staging tables are connected to work area or fact tables. We basically need staging area to hold the data ,
and perform data cleansing and merging , before loading the data into warehouse.

12.What is OLAP?
Ans:
OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.

13.What is the Difference between OLTP and OLAP?


Ans:                   OLTP                                                      OLAP
  1. Application :   * Operational: ERP, CRM, legacy apps, * Management Information System, Decision Support System.

  2.Typical users:   * Staff                                    * Managers, Executives .

  3.Horizon      :   *Weeks, Months                             *Years   .

  4.             :   * Normalization is promoted                * Denormalization is promoted .

  5.             :   *On Line Transaction Processing            *On Line Analytic Processing .

13.What is ODS?
Ans:
> ODS - Operational Data Store.
> ODS Comes between staging area & Data Warehouse.
> ODS is nothing but a staging area, in which you can keep your OLTP type  data like your day to day transactional data.

14.What is ER Diagram?
Ans:
An entity-relationship diagram is a data modeling technique that creates a graphical representation of the entities,
and the relationships between entities, within an information system.

15.What are Lookup Tables?
Ans:
Look up stage performs the same action as merge and join
stages do. But here we have a table as lookup table which
will be having a key column with the source table. if the
key colums in source are matched with the lookup table, that
rows will only be passed to output. Others can be taken in
reject.

16.What are Aggregate Tables?
Ans:
Aggregate table contains the summary of existing warhouse data , it contains data which is group accordingly (month , quartely etc)
depending upon the business needs.For example you may contain transaction in fact table for all the customers but
sometime you want the sum of amount for a particular customer over a month in that case you need to run the query in fact table which
contains millions of row and time consuming. so avoid this problem we do the aggregation of amount in aggregate over a month for each customers,
so this will improve the performance and you can retrieve the result very fastly.

17.What are Conformed Dimensions?
Ans:Conformed dimentions are dimensions which are common to the cubes.
(cubes are the schemas contains facts and dimension tables)  Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are
the Facts and Dimensions  here D1,D2 are the Conformed Dimensions  .

18.How do you Load the Time Dimension?
Ans:
     We can use SCD Type 1/2/3 to load any Dimensions based on the requirement.

OR

U can load time dimension manually by writing scripts in PL/SQL to load the time dimension table with values for a period.

19.What is a Level of Granularity of a Fact Table?
Ans:
 The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity.
 In date dimension the level could be year, month, quarter, period, week, day of granularity.

20.What is a Factless Facts Table?
Ans:Is fact table that does not contain any measure. This table will only  contain keys from different dimension tables.
    This is often used to  resolve a many-to-many cardinality issue.

21.What are Slowly Changing Dimensions (SCD)?
Ans:
 Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes slowly,
rather than changing regularly on a time basis.

22.What is a Surrogate Key?
Ans:sarrogate key is ssystem generated sequence number,an
artificial key. which can be used in maintaing the history
of the data .

23.What is Junk Dimension?
Ans:
A "junk" dimension is a collection of random transactional
codes, flags and/or text attributes that are unrelated to
any particular dimension.

The junk dimension is simply a structure that provides a
convenient place to store the junk attributes. A good
example would be a trade fact in a company that brokers
equity trades.

24.What is a Data Mart?
Ans: Data mart is a part of the data warehouse.

25.What is Schema? & Types of Schema?
Ans:
  There are 3 types of Schemas,
1.Star Schema
2.Snowflake Schema
3.Galaxy/Integrated/Hybride/Consalation Schemas
1)Star Schema-It is defined as the fact table is centrally located,surrounded by dimension tables is called star schema.
In that dimension tables are denormalized and fact table is normalized.(or)Star schema is defind as two or more fact and
two or more dimension tables that are related to foreigh keys.
2)Snowflake Schema-It is defind as the Denormalized Dimensions tables are can be splited into two or more normalized dimensions is called snowflake schema.
In that both dimensions and fact tables are normalized.
3)Hybride Schema-It is defind as a dimension table is shared by two or more fact tables.

26.Explain the Paradigm of Bill Inmon and Ralph Kimball.
Ans:
    Bill Inmon’s paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

    Ralph Kimball’s paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.