Skip to main content

Schema-Data warehouse fundamental part 4

SCHEMA- Collection of fact table and dimensional table.

There are 3 types of schemes.

  1. Star schema
  2. Snowflakes schema
  3. Integrated schema/galaxy schema/fact constellation
STAR SCHEMA

  • Star Schema, every dimension table contains one primary key.
  • Dimensional table contains de-normalized values.
  • Fact table contains foreign keys of dimensional tables
  • The relationship between fact and dimension table is foreign key and primary key
  • Every fact table contains one surrogate key (Surrogate key is system generated key)
  • Surrogate key is system generated key and it contains numerical values.
  • Fact table contains normalized data.
  • Fact table contains numeric’s or measures or facts or KPI (Key performance Indicators)
  • In Fact table, we are storing values at lowest level and this is known as “Fact granularity” or “Grain of Fact”
  • (Data’s are stored in –years-months-weeks-days. Data’s saved in daily basis gives more information when compared with data stored in weekly or monthly basis).
  • Using fact granularity we can analyze data very easily.
  • First we need to load dimensional table values and then we need to load fact table values.
  • We are designing schema by using 3rd normal form. (3NF)
  • (It allows partial duplicates and is used to reduce redundancy in DBMS)
  • (Redundancy- say raj,raj raj…..20 times is reduced to some 3 times)
  • Star schema contains less joins so that performance is more
  • If there are ‘N’ tables, then we will have N-1 joins

SNOWFLAKE SCHEMA



  • In snowflake schema, one dimension table is going to split as multiple dimensional tables.
  • Such scenario occurs when say product ID is 100 (100 is a fact in fact table) which has all brands of toothpaste like colgate, pepsodent, close up, meswak etc which all are different dimension tables.
  • We know colgate is one of dimension table and also we know colgate comes in different forms like colgate gel, colgate white, colgate sensitive etc etc. all these forms sub product of colgate which indeed could be said as there are sub dimension tables of Main dimension Colgate.
  • Dimensional Table contains Normalized data.
  • In this schema, we are having more joins and hence performance is less.
  • Structure of this schema looks like a snowflake and hence the name.
  • Using this kind of schema we can store more data in dimension table.
Difference between star schema and snowflake schema

STAR SCHEMA
SNOWFLAKE SCHEMA
Dimension table contains denormalized data
Dimension table contains normalized data
Less joins are used
More joins
Performance is high
Performance is low
Designing is easy
Designing is difficult
Dimension table contains less data
Dimension table contains more data

Integrated Schema/Galaxy schema/Fact constellation

(In practical we are not using this joins in most of applications)
  • Integrated schema contains Multiple fact tables
  • Combination of two fact table is called fact constellation
  • Joins between two fact tables is called fact joins.

Confirmed dimension: If dimension table is connected with multiple facts, then it is called conformed dimension.
  • This schema contains more joins, so performance is very less.
  • Structure of this schema is very complex
  • Every schema contains time dimension table. Every schema contains one common dimension table, i.e., Time dimension.





Comments

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)". If you get this error while Loading Data From Excel to SQL Server, then, close the Excel sheet opened and try to run queries again.

Talend ETL Part 1: SQL Server Database to Excel Sheet

Hello All, Of many ETL tools available in Market, One of the strong tool is Talend. Difference between other ETL tools and tools like Pentaho, Talend, Clover ETL, Adeptia Integration etc, is that they support NO SQL Cross domains, BIG Data, Hadoop etc. Other ETL tools like, SSIS, Informatica are now coming with their higher versions, which consists of Hadoop Integration. Basically We can say, there are two databases types. 1) RDBMS (Example: SQL Server, MySQL, Oracle etc) 2) Non RDBMS (Example: MongoDB, InfiniDB etc) Talend Supports Non RDBMS databases. Here I would like to share my hands on experience on Talend and how to use it and explain basic components of Talend. Approx there are 500 components we can find in Talend. So lets Kick Start from Basics. First lets try to load Data from Microsoft SQL Server to Excel. Steps: Step1: Open Talend Studio. Step 2: Right click on Job Design and Create a new Job by giving some job name. Step 3: Give the name o...