Skip to main content

Schema-Data warehouse fundamental part 4

SCHEMA- Collection of fact table and dimensional table.

There are 3 types of schemes.

  1. Star schema
  2. Snowflakes schema
  3. Integrated schema/galaxy schema/fact constellation
STAR SCHEMA

  • Star Schema, every dimension table contains one primary key.
  • Dimensional table contains de-normalized values.
  • Fact table contains foreign keys of dimensional tables
  • The relationship between fact and dimension table is foreign key and primary key
  • Every fact table contains one surrogate key (Surrogate key is system generated key)
  • Surrogate key is system generated key and it contains numerical values.
  • Fact table contains normalized data.
  • Fact table contains numeric’s or measures or facts or KPI (Key performance Indicators)
  • In Fact table, we are storing values at lowest level and this is known as “Fact granularity” or “Grain of Fact”
  • (Data’s are stored in –years-months-weeks-days. Data’s saved in daily basis gives more information when compared with data stored in weekly or monthly basis).
  • Using fact granularity we can analyze data very easily.
  • First we need to load dimensional table values and then we need to load fact table values.
  • We are designing schema by using 3rd normal form. (3NF)
  • (It allows partial duplicates and is used to reduce redundancy in DBMS)
  • (Redundancy- say raj,raj raj…..20 times is reduced to some 3 times)
  • Star schema contains less joins so that performance is more
  • If there are ‘N’ tables, then we will have N-1 joins

SNOWFLAKE SCHEMA



  • In snowflake schema, one dimension table is going to split as multiple dimensional tables.
  • Such scenario occurs when say product ID is 100 (100 is a fact in fact table) which has all brands of toothpaste like colgate, pepsodent, close up, meswak etc which all are different dimension tables.
  • We know colgate is one of dimension table and also we know colgate comes in different forms like colgate gel, colgate white, colgate sensitive etc etc. all these forms sub product of colgate which indeed could be said as there are sub dimension tables of Main dimension Colgate.
  • Dimensional Table contains Normalized data.
  • In this schema, we are having more joins and hence performance is less.
  • Structure of this schema looks like a snowflake and hence the name.
  • Using this kind of schema we can store more data in dimension table.
Difference between star schema and snowflake schema

STAR SCHEMA
SNOWFLAKE SCHEMA
Dimension table contains denormalized data
Dimension table contains normalized data
Less joins are used
More joins
Performance is high
Performance is low
Designing is easy
Designing is difficult
Dimension table contains less data
Dimension table contains more data

Integrated Schema/Galaxy schema/Fact constellation

(In practical we are not using this joins in most of applications)
  • Integrated schema contains Multiple fact tables
  • Combination of two fact table is called fact constellation
  • Joins between two fact tables is called fact joins.

Confirmed dimension: If dimension table is connected with multiple facts, then it is called conformed dimension.
  • This schema contains more joins, so performance is very less.
  • Structure of this schema is very complex
  • Every schema contains time dimension table. Every schema contains one common dimension table, i.e., Time dimension.





Comments

Popular posts from this blog

Zip/Unzip multiple files and also include password for zipped file using SSIS

We have many scenario that we need to Zip many files which we come across and then so some operations like either sending it as a email or just moving zipped file to some other destinations etc. But we were using manual method to zip multiple files. In this post, I tried to create a package which will zip multiple files using SSIS. Here for Zipping files purpose, I'm using 7-ZIP which is free software available in google sites. Download files and install onto your system. First let me show how to Zip on file and later I will show how to zip multiple files using SSIS and 7Zip tool. Compressing Single file. Here I'm trying to Zip one single flat file which is of 40MB size. I kept this file in C:\Documents and Settings\\Desktop\test\source folder. Now to compress this file, I will open my SSIS and I'm dragging and dropping EXECUTE PROCESS TASK from Control Flow. Now right click on Execute Process task and go for edit and select Process option. In process tab,

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully. 

How to move multiple files in ssis and also rename simultaneously

There are two ways to achieve this. 1) We can move the flat files and then rename it. 2) While moving files itself, automatic rename should be done. We will do the second type. The criteria is to rename the files while moving from source to destination. So for that, we need FILE SYSTEM TASK to be included. Secondly since we need to move many files, we will use FOR EACH LOOP CONTAINER. To fetch all the files, we can use FOR EACH LOOP task in SSIS. In collection tab, we can select FOREACH FILE enumerator option for fetching files and we can change enumerator configuration Folder option: Points to source where we need to fetch files. Files: will give us idea whether we need to fetch all the files (*.*) or if we give extension like *.txt, it is going to fetch only  .txt files . Once I give Source name in FOR EACH LOOP container, It is going to fetch all the files corresponding to that path. Retrieve file name: This option is used to let the variables mentioned in VARIA