Skip to main content

Dimension tables- Fundamental of Datawarehouse


Dimension table contain descriptive information.

Types of dimensions
There are three types of dimensions
  1. Junk dimension
  2. Conformed dimension
  3. SCD or CDC dimension(Slowly changing dimension or Changed data capture)

Junk Dimension:
Data’s which are not useful for clients is called junk dimension.
Ex: Flags, Row inserted by user etc

Conformed dimension:
If dimension is connected with multiple facts then it is called as conformed dimension. 
Ex: Time

SCD or CDC:

Example: If say a person is changed from BTM to Silk board the changed values is inserted in the table.
If anything is changed in the source that values, we are loading in to the target this is known as SCD or CDC.

CRITICAL COLUMN: If values are changing according to the time such columns is called critical column.


Types of SCD:

SCD1 or TYPE1: Here only current value will be present in the dimension table. Here in SCD1 is going to overwrite on previous values.
SCD1 contains only current information, if anything changes in the source the changed value overwrites on the target so that SCD1 contains only current information.
We can analyze only current data.












SCD2: (SLOWLY GROWING TARGET)
If anything is changed in source then SCD contains both previous and current information.
If anything is changed in source then new row is inserted into a target, which contains both previous and current information. 
Using this we can analyze complete data







Example: Say A person is moving from BTM to SLK BOARD in 2011 and then again he is moving to YELHANKA in 2012, This changes has to be entered even at destination.
Using SCD 2, a new row is inserted to maintain the records at target as shown in figure below:

DIMENSION TABLE:

CUSTOMER ID
CUSTOMER NAME
CUSTOMER ADDRESS
YEAR
100
RAM
BTM
2010
100
RAM
SLK BOARD
2011
100
RAM`
YELHANKA
2012


 The target table size slowly increases so that it is also called as SLOWLY GROWING TARGET.

TYPES OF SCD2:
It has got 3 types
  • SCD2 WITH VERSION NUMBER
  • SCD2 WITH FLAG
  • SCD2 WITH DATE RANGE

SCD2 WITH VERSION NUMBER: -

CUSTOMER ID
CUSTOMER NAME
CUSTOMER ADDRESS
VERSION
100
RAM
BTM
1.0
100
RAM
SLK BOARD
1.1
100
RAM`
YELHANKA
1.2

SCD2 WITH FLAG: -

CUSTOMER ID
CUSTOMER NAME
CUSTOMER ADDRESS
FLAG
100
RAM
BTM
0
100
RAM
SLK BOARD
0
100
RAM`
YELHANKA
1

Here ‘0’ represents previous values and ‘1’ represents current value.
SCD2 with flag is used 99% of the time as loading 0 and 1 is easy.

SCD2 WITH DATE RANGE: -

CUSTOMER ID
CUSTOMER NAME
CUSTOMER ADDRESS
DATE
100
RAM
BTM
12th March 2010
100
RAM
SLK BOARD
8th Feb 2011
100
RAM`
YELHANKA
2nd Dec 2012

Comments

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)". If you get this error while Loading Data From Excel to SQL Server, then, close the Excel sheet opened and try to run queries again.

Talend ETL Part 1: SQL Server Database to Excel Sheet

Hello All, Of many ETL tools available in Market, One of the strong tool is Talend. Difference between other ETL tools and tools like Pentaho, Talend, Clover ETL, Adeptia Integration etc, is that they support NO SQL Cross domains, BIG Data, Hadoop etc. Other ETL tools like, SSIS, Informatica are now coming with their higher versions, which consists of Hadoop Integration. Basically We can say, there are two databases types. 1) RDBMS (Example: SQL Server, MySQL, Oracle etc) 2) Non RDBMS (Example: MongoDB, InfiniDB etc) Talend Supports Non RDBMS databases. Here I would like to share my hands on experience on Talend and how to use it and explain basic components of Talend. Approx there are 500 components we can find in Talend. So lets Kick Start from Basics. First lets try to load Data from Microsoft SQL Server to Excel. Steps: Step1: Open Talend Studio. Step 2: Right click on Job Design and Create a new Job by giving some job name. Step 3: Give the name o...